RWX: Top Impact Publications from the Last 20 Years

Im aktuellen Newsletter DDI DIRECTIONS der DDI Alliance, der insbesondere über die Mailingliste [DDI-users] versandt wird, ist wieder die Kolumne Read-Write-Execute (RWX) erschienen, die auf wissenschaftliche Publikationen aus dem Bereich Metadaten hinweisen will:

DDI DIRECTIONS: Logo of the DDI Alliance’s Newsletter

The DDI Community has produced a rich store of DDI and metadata-related publications. Read-Write-Execute (RWX) will highlight some of these existing publications as well as new work as it is produced. The first column featured some of the foundations of DDI in scientific literature. This second column will revisit some of the top impact publications related to DDI from the last 20 years.

It is not surprising that the DDI publications with many citations cover more high level discussions rather than specific technical details. But revisiting conceptual fundamentals or policy goals, comparing standards, and evaluating approaches should also be done if one is currently planning the next project. So, let’s take a look at some of the top cited DDI publications over the last 20 years.

When Ryssevik and Musgrave (2001) write about their social science dream machine, they were thinking about the distributed NESSTAR system, which is based on DDI. But there is nothing wrong with the idea of an “integrated resource discovery gateway and search system to identify and locate these resources” which consists of not less than “all existing empirical data” (what is today called federated search). And being able to convert an “extensive amount of metadata … totally integrated with the data as such” to a number of formats and copy them to a local machine is a reasonable wish. The same holds true with “an efficient feedback system to the body of metadata, allowing the user to add to the collecting memory of a data set”. Even “The FAIR Guiding Principles for scientific data management and stewardship” (doi:10.1038/sdata.2016.18) from 2016, which are considered to be state of the art, do not cover the range of features Ryssevik and Musgrave describe.

The most cited publication in 2004 contains an important reminder: “Technology itself, however, will not fulfill the promise of e-science, Information and communication technologies provide the physical infrastructure. It is up to national governments, international agencies, research institutions, and scientists themselves to ensure the institutional, financial and economic, legal, and cultural and behavioural aspects of data sharing are taken into account.” (Arzberger et al. 2004: 137) The use of DDI, especially at ICPSR, serves as a use case for the technological domain where access and usability and multiple use of the data must be assured by interoperability.

While Arzberger et al. look at use cases from different disciplines in the different identified domains, Willis, Greenberg and White (2012) compare nine metadata standards in order to understand similarities and differences. They consider DDI as the standard to describe social science statistical data from experimental, observational, and statistical studies. The objective to cover the whole data lifecycle is unique to DDI. DDI is one of two standards which “are intended to be comprehensive, yet support instances of description using a minimal number of required elements.” They conclude that metadata scheme creation depends more on the goals than on the discipline or type of data described (p.1517). At the same time the common discipline specific approach contributes “to artificial boundaries between disciplines and impede interdisciplinary and transdisciplinary reuse” (p. 1516).

For Jeffrey et al. (2014), who describe the CERIF approach to design a research information management system, domain specific metadata standards build the lowest of three levels of information. The first level consists of information on research output (organized by flat metadata like Dublin Core similar to a catalogue card). The second level is built by contextual metadata, which can generate the discovery metadata of level one and point to the domain metadata of level three (which could be DDI). The contextual metadata hold information about base entities (e.g., persons and publications) and connect them using a semantic layer with flexible link entities, which can express roles (defined by a term which captures the semantics and a controlled vocabulary to which the term belongs (p. 10) and have a start and end date). Using this semantic layer a publication can have an author, a publication date, and even a country of publication (using so called localisation entities).

This small list of four top publications related to DDI:

  • shows us that looking more than 15 years back might yield new insights into new products from old ideas,
  • reminds us that technology does not solve social problems,
  • reveals different perspectives on the discipline specific fragmentation of metadata standards,
  • and gives an insight into a concept of a flexible and expressive linking mechanism.
References

(also available at Bibsonomy)

Arzberger, P., Schroeder, P., Beaulieu, A., Bowker, G., Casey, K., Laaksonen, L., Moorman, D., Uhlir, P. & Wouters, P. (2004). Promoting Access to Public Research Data for Scientific, Economic, and Social Development. Data Science Journal, 3, 135-152. doi:10.2481/dsj.3.135

Jeffery, K., Houssos, N., Jörg, B. & Asserson, A. (2014). Research Information management: the CERIF approach. International Journal of Metadata, Semantics and Ontologies, 9, 5-14. doi:10.1504/ijmso.2014.059142

Ryssevik, J. & Musgrave, S. (2001). The Social Science Dream Machine: Resource Discovery, Analysis, and Delivery on the Web. Social Science Computer Review, 19, 163-174. doi:10.1177/089443930101900203

Willis, C., Greenberg, J. & White, H. (2012). Analysis and Synthesis of Metadata Goals for Scientific Data. Journal of the American Society for Information Science and Technology, 63, 1505–1520. doi:10.1002/asi.22683

A bibliography of DDI articles, working papers, and presentations is being built and is available at Bibsonomy.org with easily reusable bibliographic metadata. This metadata will also be made available on the DDI Alliance website. Suggestions for papers and topics for RWX, or the bibliography, are appreciated and can be sent to: Knut Wenzig, kwenzig@diw.de

RWX: Foundations of DDI in Scientific Literature

DDI DIRECTIONS: Logo of the DDI Alliance's Newsletter

DDI DIRECTIONS: Logo of the DDI Alliance’s Newsletter

Im aktuellen Newsletter DDI DIRECTIONS der DDI Alliance, der insbesondere über die Mailingliste [DDI-users] versandt wird, ist erstmals die Kolumne Read-Write-Execute (RWX) erschienen, die auf wissenschaftliche Publikationen aus dem Bereich Metadaten hinweisen will. Die Kolumne könnte auch ein Beispiel sein für entsprechende Informationen aus der (deutschsprachigen) Forschungsdatenszene, Interessierte können sich gerne bei Knut Wenzig melden. Hier die Ankündigung und der Text aus dem Newsletter:

The DDI Community has produced a rich store of DDI and metadata-related publications over the last 20 years. Read-Write-Execute (RWX) will highlight some of these existing publications as well as new work as it is produced. This first column will feature some of the foundations of DDI in scientific literature. (Thanks to Achim Wackerow for his suggestions and to Kelly Chatain for editorial assistance.) At the same time, a bibliography of DDI articles, working papers, and presentations is being built and is available at Bibsonomy.org with easily reusable bibliographic metadata. This metadata will also be made available on the DDI Alliance website. Suggestions for papers and topics for RWX, or the bibliography, are appreciated and can be sent to: Knut Wenzig, kwenzig@diw.de.

In her paper “The DDI matures: 1997 to the Present”, Mary Vardigan (2013), the former Director of the DDI Alliance, presents a timeline of the conceptual and organizational development of DDI. The initial SGML Codebook Committee meeting occurred in 1995, but 1997 was the year of the first “instantiation in XML” (Vardigan 2013: 45). DDI started (in versions 1 and 2) to describe data sets by the codebook approach, which is still supported and widely in use. From version 3 on the scope was broadened to document the whole lifecycle of data, using data collection as a starting point, and finally enabling repurposing and reuse of DDI elements.[1] The paper ends with a list of high level design goals, referred to in “Developing a Model-Driven DDI Specification” (Participants in 2012 Dagstuhl Seminar on DDI Moving Forward, 2012), on which the next version of DDI is based.

The first reference listed in Mary Vardigan’s paper is “Providing Global Access to Distributed Data Through Metadata Standardisation – the Parallel Stories of NESSTAR and the DDI”, submitted by the Norwegian Social Science Data Services and prepared by Jostein Ryssevik (1999). The “relative distance between the end-users of a statistical material and the production process” (p. 2) was identified as the fundamental problem to be solved. As discovery systems were provided to address this problem, the need for metadata standards like DDI emerged. The authors recall that DDI used the new (at the time) XML language, and that the defined XML code could contain the description of the document itself, of the study, the file, the variables, and other study-related materials. Already in this early paper RDF (Resource Description Framework[2]) is described as an application “that provides the foundation for metadata interoperability across different resource description communities.” (p. 5) Using DDI as a language, the medium NESSTAR could deliver a great range of interconnected services and platforms. Even if the last release of NESSTAR is more than one year old, the ideas in the article – whether or not realized by the software – deserve to be revisited. Using the metaphoric antonym of Bazaars vs. Cathedrals, the same authors (Ryssevik 2000) conceptualize their vision of – even then! – metadata systems that cover the complete life-cycle.

The article “The Data Documentation Initiative”, by Grant Blank and Karsten Boye Rasmussen (2004), was published in Social Science Computer Review, one of the top ranked academic journals in the “Information Science & Library Science” category. The authors describe the requirements of data documentation in the social sciences, how DDI as an XML based standard can be used to store information presented in codebooks, and how “standardization creates new opportunities for software development to aid users.” (p. 314)

Today, after 20 years, we can read and reevaluate those ideas only because people took the time to write them down. In this sense contributing to the scientific inventories of knowledge should be understood as a best practice and an integral part of software development for the academic community.

[1] Version 3 is described in Vardigan, Heus, Thomas 2008.
[2] Vardigan (2013: 48) expects that RDF will be used in the upcoming version of DDI as a connection to the semantic web.

References

(also available at Bibsonomy)

Blank, G. & Rasmussen, K. B. (2004), ‚The Data Documentation Initiative: The Value and Significance of a Worldwide Standard‘, Social Science Computer Review 22 (3), 307-318, doi:10.1177/0894439304263144.

Participants in 2012 Dagstuhl Seminar on DDI Moving Forward (2012), Developing a Model-Driven DDI Specification, DDI Working Paper Series (Other Topics) DDI Alliance, doi:10.3886/DDIWorkingPaper04.

Ryssevik, J. & The Norwegian Social Science Data Services (1999), Providing Global Access to Distributed Data through Metadata Standardisation–the Parallel Stories of Nesstar and the DDI, Conference of European Statistics, UN/ECE Work Session on Statistical Metadata (Geneva, Switzerland, 22-24 September 1999), Working Paper 10, http://www.unece.org/stats/documents/1999/09/metis/10.e.pdf.

Ryssevik, J. & The Norwegian Social Science Data Services (2000), Bazaar Style Metadata in the Age of the Web–An ‚Open Source‘ Approach to Metadata Development, Conference of European Statistics, UN/ECE Work Session on Statistical Metadata (Washington D.C., United States, 28-30 November 2000), Working Paper 4, http://www.unece.org/fileadmin/DAM/stats/documents/2000/11/metis/4.e.pdf.

Vardigan, M.; Heus, P. & Thomas, W. (2008), ‚Data Documentation Initiative: Toward a Standard for the Social Sciences‘, International Journal of Digital Curation 3 (1), 107-113, doi:10.2218/ijdc.v3i1.45.

Vardigan, M. (2013), ‚The DDI Matures: 1997 to the Present‘, IASSIST Quarterly 37 (1–4), 45-50, http://www.iassistdata.org/sites/default/files/iq/iqvol371_4_vardigan.pdf