paper – Survey Data Blog

Recommendations on Data Versioning

We often say that “A is a version of B” but do not explain what we mean by “version”. We imply that B was somehow derived from A or that they share a common ancestor. But how is B related to A? How do they differ? Do they differ in content or format? What is the significance of this difference? While this sounds like a question about the provenance of a dataset, it goes beyond that and asks questions about the identity of a digital object and the intellectual and creative work it embodies.

The project “PID Reference Model for the Versioning of Research Data” at the Chair of Information Management at the Berlin School of Library and Information Science (IBI) at Humboldt Universität zu Berlin focused on the versioning of research data publications in the context of ongoing discussions about the reform of research evaluation from an infrastructural perspective. As an output of the project, the team led by visiting fellow Jens Klump developed a set of guidelines on how to version research data.

Klump, J., Pampel, H., Rothfritz, L., & Strecker, D. (2024). Recommendations on Data Versioning. Berlin, Germany: Humboldt-Universität zu Berlin. https://doi.org/10.5281/zenodo.13743876

This recommendation outlines key aspects of research data versioning for scientists and information management professionals at research-performing organisations. It is based on prior work by the Research Data Alliance Data Versioning Working Group.

Here is a presentation by the team lead in German: https://www.ibi.hu-berlin.de/de/von-uns/bbk/abstracts/ss_24/klump_versionen

Heft 1-2020 der Bausteine Forschungsdatenmanagement erschienen und Call für Heft 2-2020

Das neue Heft der Bausteine Forschungsdatenmanagement mit zehn Beiträgen zum Thema Forschungsdatenmanagement ist online: https://bausteine-fdm.de/issue/view/245

Mit dabei sind neue Beiträge aus der Workshopreihe der DINI/nestor AG Forschungsdaten und dem Workshop “Forschungsdatenmanagement und -infrastruktur in DFG-Sonderforschungsbereichen” des Projekts GRAce aus der Göttinger eResearch Alliance sowie Beiträge zu Kosten und Aufwänden von FDM und Thesen zur Kompetenzausbildung.

Wir möchten gleichzeitig den Call für die Ausgabe 2-2020 der Bausteine Forschungsdatenmanagement eröffnen.

Beiträge können alle einreichen, die professionell Forschungsdaten managen und die Forscherinnen und Forscher im Umgang mit diesen Daten beraten und unterstützen.

Deadline für die Einsendung ist der 01. Juni 2020.

Nähere Informationen zu Themenspektrum und Zielgruppen finden Sie unter: https://bausteine-fdm.de/about

Bitte beachten Sie vor allem Informationen zur Einreichung und zur Begutachtung.

Die Redaktion freut sich auf Ihre Beiträge!

RWX: Foundations of DDI in Scientific Literature

DDI DIRECTIONS: Logo of the DDI Alliance’s Newsletter

Im aktuellen Newsletter DDI DIRECTIONS der DDI Alliance, der insbesondere über die Mailingliste [DDI-users] versandt wird, ist erstmals die Kolumne Read-Write-Execute (RWX) erschienen, die auf wissenschaftliche Publikationen aus dem Bereich Metadaten hinweisen will. Die Kolumne könnte auch ein Beispiel sein für entsprechende Informationen aus der (deutschsprachigen) Forschungsdatenszene, Interessierte können sich gerne bei Knut Wenzig melden. Hier die Ankündigung und der Text aus dem Newsletter:

The DDI Community has produced a rich store of DDI and metadata-related publications over the last 20 years. Read-Write-Execute (RWX) will highlight some of these existing publications as well as new work as it is produced. This first column will feature some of the foundations of DDI in scientific literature. (Thanks to Achim Wackerow for his suggestions and to Kelly Chatain for editorial assistance.) At the same time, a bibliography of DDI articles, working papers, and presentations is being built and is available at Bibsonomy.org with easily reusable bibliographic metadata. This metadata will also be made available on the DDI Alliance website. Suggestions for papers and topics for RWX, or the bibliography, are appreciated and can be sent to: Knut Wenzig, kwenzig@diw.de.

In her paper “The DDI matures: 1997 to the Present”, Mary Vardigan (2013), the former Director of the DDI Alliance, presents a timeline of the conceptual and organizational development of DDI. The initial SGML Codebook Committee meeting occurred in 1995, but 1997 was the year of the first “instantiation in XML” (Vardigan 2013: 45). DDI started (in versions 1 and 2) to describe data sets by the codebook approach, which is still supported and widely in use. From version 3 on the scope was broadened to document the whole lifecycle of data, using data collection as a starting point, and finally enabling repurposing and reuse of DDI elements.^[1] The paper ends with a list of high level design goals, referred to in “Developing a Model-Driven DDI Specification” (Participants in 2012 Dagstuhl Seminar on DDI Moving Forward, 2012), on which the next version of DDI is based.

The first reference listed in Mary Vardigan’s paper is “Providing Global Access to Distributed Data Through Metadata Standardisation – the Parallel Stories of NESSTAR and the DDI”, submitted by the Norwegian Social Science Data Services and prepared by Jostein Ryssevik (1999). The “relative distance between the end-users of a statistical material and the production process” (p. 2) was identified as the fundamental problem to be solved. As discovery systems were provided to address this problem, the need for metadata standards like DDI emerged. The authors recall that DDI used the new (at the time) XML language, and that the defined XML code could contain the description of the document itself, of the study, the file, the variables, and other study-related materials. Already in this early paper RDF (Resource Description Framework^[2]) is described as an application “that provides the foundation for metadata interoperability across different resource description communities.” (p. 5) Using DDI as a language, the medium NESSTAR could deliver a great range of interconnected services and platforms. Even if the last release of NESSTAR is more than one year old, the ideas in the article – whether or not realized by the software – deserve to be revisited. Using the metaphoric antonym of Bazaars vs. Cathedrals, the same authors (Ryssevik 2000) conceptualize their vision of – even then! – metadata systems that cover the complete life-cycle.

The article “The Data Documentation Initiative”, by Grant Blank and Karsten Boye Rasmussen (2004), was published in Social Science Computer Review, one of the top ranked academic journals in the “Information Science & Library Science” category. The authors describe the requirements of data documentation in the social sciences, how DDI as an XML based standard can be used to store information presented in codebooks, and how “standardization creates new opportunities for software development to aid users.” (p. 314)

Today, after 20 years, we can read and reevaluate those ideas only because people took the time to write them down. In this sense contributing to the scientific inventories of knowledge should be understood as a best practice and an integral part of software development for the academic community.

[1] Version 3 is described in Vardigan, Heus, Thomas 2008.
[2] Vardigan (2013: 48) expects that RDF will be used in the upcoming version of DDI as a connection to the semantic web.

References

(also available at Bibsonomy)

Blank, G. & Rasmussen, K. B. (2004), ‘The Data Documentation Initiative: The Value and Significance of a Worldwide Standard’, Social Science Computer Review 22 (3), 307-318, doi:10.1177/0894439304263144.

Participants in 2012 Dagstuhl Seminar on DDI Moving Forward (2012), Developing a Model-Driven DDI Specification, DDI Working Paper Series (Other Topics) DDI Alliance, doi:10.3886/DDIWorkingPaper04.

Ryssevik, J. & The Norwegian Social Science Data Services (1999), Providing Global Access to Distributed Data through Metadata Standardisation–the Parallel Stories of Nesstar and the DDI, Conference of European Statistics, UN/ECE Work Session on Statistical Metadata (Geneva, Switzerland, 22-24 September 1999), Working Paper 10, http://www.unece.org/stats/documents/1999/09/metis/10.e.pdf.

Ryssevik, J. & The Norwegian Social Science Data Services (2000), Bazaar Style Metadata in the Age of the Web–An ‘Open Source’ Approach to Metadata Development, Conference of European Statistics, UN/ECE Work Session on Statistical Metadata (Washington D.C., United States, 28-30 November 2000), Working Paper 4, http://www.unece.org/fileadmin/DAM/stats/documents/2000/11/metis/4.e.pdf.

Vardigan, M.; Heus, P. & Thomas, W. (2008), ‘Data Documentation Initiative: Toward a Standard for the Social Sciences’, International Journal of Digital Curation 3 (1), 107-113, doi:10.2218/ijdc.v3i1.45.

Vardigan, M. (2013), ‘The DDI Matures: 1997 to the Present’, IASSIST Quarterly 37 (1–4), 45-50, http://www.iassistdata.org/sites/default/files/iq/iqvol371_4_vardigan.pdf

Publikationsorte für Beiträge zu FDZ-Themen

Screenshot_Bibsonomy — Screenshot BibSonomy.org

Wo können Beiträge zu Themen der operativen Arbeit in Forschungsdatenzentren publiziert werden? Eine Liste, die derzeit noch einen Schwerpunkt im Bereich Metadaten/DDI hat, findet sich auf BibSonomy.org. Daraus könnte mittelfristig auch eine Bibliographie entstehen. Weitere Hinweise können gerne in den Kommentaren gegeben werden.

[Der Beitrag wurde motiviert duch Diskussionen auf dieser Veranstaltung.]

Working with the PASS data: User Guide Examples in SPSS

While the PASS Scientific Use File has been available in SPSS format since wave 1, PASS support documents for SPSS users have not been available so far. With the recent release of a new Quick Start File the PASS team now provides all the worked examples from the PASS User Guide originally done in Stata as SPSS/PASW code. This includes examples for merging household, individual, spell and weight datsets, as well as using the cross-sectional and longitudinal weights for projections to different populations.

PASS Quick Start File – Analysing the PASS data using SPSS/PASW

Paper: User-focused threat identification for anonymised microdata

When producing anonymised microdata for research, national statistics institutes (NSIs) identify a number of ‘risk scenarios’ of how intruders might seek to attack a confidential dataset. Hans-Peter Hafner, Felix Ritchie and Rainer Lenz argue in their paper “User-focused threat identification for anonymised microdata” (PDF) that the strategy used to identify confidentiality protection measures can be seriously misguided, mainly since scenarios focus on data protection without sufficient reference to other aspects of data. This paper brings together a number of findings to see how the above problem can be addressed in a practical context. Using as an example the creation of a scientific use file, the paper demonstrates that an alternative perspective can have dramatically different outcomes. (Source: Authors’ abstract)