Wissen über Zeichensätze und -kodierung

Standard Developer Shirt, Lizenz: (CC BY 2.0), Autor: https://www.flickr.com/photos/acidpix/
Standard Developer Shirt, Lizenz: (CC BY 2.0), Autor: https://www.flickr.com/photos/acidpix/

In Forschungsdatenzentren wird oft programmiert. Wer programmiert, entwickelt Software. Joel Spolsky definiert in einem älteren und trotzdem lesenswerten Beitrag ein Mindestmaß an Wissen über Zeichensätze und -kodierung: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

Workshop „Datenaufbereitung und Dokumentation“ am 22./23. Februar 2016 in Berlin

diw_berlin_small
Das DIW Berlin.

Das SOEP richtet am 22./23. Februar 2016 den nächsten Workshop „Datenaufbereitung und Dokumentation“ in Berlin am DIW aus.

Wie im letzten Jahr findet der Workshop im Vorfeld des dann mittlerweile 10. Workshops der deutschsprachigen Panelsurveys statt, der direkt im Anschluss am 23. und 24. Februar 2016 geplant ist.

Die Anmeldung zu beiden Veranstaltungen ist ab sofort möglich.

dta-Dateien von Stata 13 in R öffnen

read.dta
Screenshot: Pakete werden in einem R-Skript geladen.

Lange Zeit hat das Paket foreign gute Dienste geleistet beim Öffnen und Schreiben von Stata-Datendateien (mit der Dateiendung dta). Die Entwicklung dieser Funktion des Pakets wird leider mit Stata Version 12 eingefroren. Dateien von Stata 13 werden nicht mehr unterstützt.

Die Hilfe zu foreign (s. S. 6 in der Dokumentation) nennt die Pakete memisc und readstata13 als Alternativen. Etwas Recherche fördert dann noch haven zu Tage.

Es folgen die Ergebnisse eines kleinen Tests, bei dem ein etwas erweiterter (construct_test_data.do) auto-Datensatz, wie er von Stata 13 gespeichert wird, Verwendung fand. Hier wurden zweisprachige Label, Umlaute und weitere Missings eingebaut.

Testsieger ist readstata13 von Jan Marvin Garbuszus, Sebastian Jeworutzki u.a. Im von read.dta13  importierten Objekt sind die Label-Informationen aus dem Datensatz als Attribute auch zweisprachig vorhanden.

An zweiter Stelle kommt haven von Hadley Wickham, der die ReadStat C library von Evan Miller verwendet, was prinzipiell nach einer guten Idee klingt. Die Version 0.2.0.9000 importiert zwar etwas, RStudio verweigert aber die Ansicht mit view. Im importierten Objekt ist die Mehrsprachigkeit in den Labels leider dahin.

Das Paket memisc von Martin Elff scheitert vollständig. In Version 0.97 bricht Stata.file den Import ab.

(Danke an Guido Schulz für den Hinweis zu readstata13.)

CfP: “Recent Developments in Metadata Capture, Discovery and Harmonization in the Social Sciences”

uollogo
University of Leicester (logo)

This session invites presentations dealing with structured metadata in a standardized form across the survey life-cycle: models, systems and tools for i.e. instrument design, data entry, data processing, maintaining data documentation, and capturing and storing the metadata within a repository for later reuse. There is increased interest in supporting the comparison and harmonization of studies/waves over space and time, and across studies, especially at the level of theoretical concepts, questions, and variables to which structured metadata is well suited.

Capturing metadata as early on in the survey life-cycle as possible in a structured way enhances transparency and quality, enables reproducible research and reuse of survey components for other waves or surveys.

A wide range of different products and services for different users can be generated on the basis of computer-processable metadata like web-based information systems, traditional codebooks, command setups for statistical packages, question banks, and searching and locating of data which assist in the use or interpretation of the data.

Papers are invited on, but not limited to, the following topics: reuse of metadata across space, time, and studies, metadata banks such as for questions and classifications, metadata-driven processes, and metadata-driven information systems, possibly using the major specification for social science metadata, DDI Lifecycle (DDI 3 branch of the Data Documentation Initiative).

The session is aimed at survey designers and implementers, data and metadata managers, information system managers of cross-national surveys, metadata experts, and others.

Session at the 9th International Conference on Social Science Methodology Research Committee on Logic and Methodology RC33
Conference dates: 11-16 September 2016
Venue: University of Leicester, UK
Deadline: 21 January 2016

[via DDI-users]

CfP: “The Role and Benefit of Metadata Capture, Discovery and Harmonization in Survey Research”

homeHeaderLogoImage_en_US
ACSPRI Logo

This session invites presentations dealing with structured metadata in a standardized form across the data life-cycle: case studies, systems and tools for i.e. instrument design, data entry, data processing, maintaining data documentation, and capturing and storing the metadata within a repository for later re-use. Capturing metadata as early on in the survey life-cycle as possible in a structured way enhances transparency and quality, supports harmonization and comparison of studies, and enables reproducible research and reuse of survey components for other waves or surveys. Metadata management can be seen as an integrated part of the survey research process. A wide range of different products and services for different audiences can be generated on the basis of metadata like web-based information systems, traditional codebooks, command setups for statistical packages, question banks, and searching and locating of data.  Papers are invited on, but not limited to, the following topics: reuse of metadata across space, time, and studies, metadata banks such as for questions and classifications, and metadata-driven information systems, possibly using DDI Lifecycle (Data Documentation Initiative). The session is aimed at survey designers and implementers, data and metadata managers, information system managers of cross-national surveys, metadata experts, and others.

Session at the 5th Biennial ACSPRI Social Science Methodology Conference 2016
ACSPRI – Australian Consortium for Social and Political Research Incorporated
Theme: Social science in Australia: 40 years on
Conference dates: Tuesday July 19 – Friday July 22, 2016
Venue: The University of Sydney, Sydney, Australia
Deadline: Friday March 4, 2016

[via DDI-users]

Dictionary with terms for the Research Data Domain

dictionary
Screenshot: CASRAI dictionary for the Research Data Domain

The Consortia Advancing Standards in Research Administration Information (CASRAI) provides a dictionary containing terms for the Research Data Domains. Each term has a unique identifier (UUID) and a URL that can be used as references to enhance reading comprehension of documents by hyperlinking terms to their definition. The URL for each term contains a link to a Discussion page to complete the feedback loop with the community of users.

The Glossary has been developed in consultation with vocabulary experts and practitioners from a wide cross-section of stakeholder groups. It is meant to be a practical reference for individuals and working groups concerned with the improvement of research data management, and as a meeting place for further discussion and development of terms. The aim is to create a stable and sustainably governed glossary of community accepted terms and definitions, and to keep it relevant by maintaining it as a ‘living document’ that is updated when necessary.

Form other sections of the dictionary one can return to this pilot section using the top-menu item Filter by and selecting Research Data Domain. To see all terms in the CASRAI dictionary (including the RDC terms), go here: http://dictionary.casrai.org/Category:Terms

In addition to direct comments on specific terms in the Glossary CASRAI is very interested in receiving feedback about the Glossary in general. Here is a short survey: https://www.surveymonkey.com/r/Glossary_ResearchDataManagement

This section of the dictionary is developed and maintained by Research Data Canada’s (RDC) Standards & Interoperability Committee (http://www.rdc-drc.ca) in collaboration with CASRAI. It is made publicly available under a Creative Commons Attribution Only license (CC-BY).

(via [DDI-users])

Two new Stata packages -useold- and -saveascii- now available on SSC

Thanks to the SSC maintainer  Kit Baum, two new commands are available on SSC: useold and saveascii. Both deal with unicode translation in Stata 14 (or younger).

useold works as an inline replacement for Stata’s regular use command. If the version of the Stata instance executing the command is 14 or younger, then it is checked if unicode translation is necessary and, if yes, unicode translate is executed on a temporary copy of the file before opening it. The default code page of the operating system is assumed as source encoding (which might be wrong and can be overridden via option).

You can install useold with:

ssc install useold

saveascii works as an inline replacement for Stata’s regular saveold command. It implements conversion functions as presented by Alan Riley here on Statalist.If the version of the Stata instance executing the command is 14 or younger, all unicode contents (data labels, variable names, variable labels, value label names and contents, characteristics names and contents) are converted to ASCII before running saveold. The default code page of the operating system is assumed as target encoding (which might be wrong and can be overridden via option).

You can install saveascii with:

ssc install saveascii

Both packages come with help files that contain more details on how to use them.

Working with the PASS data: User Guide Examples in SPSS

MR_08-15_ENWhile the PASS Scientific Use File has been available in SPSS format since wave 1, PASS support documents for SPSS users have not been available so far. With the recent release of a new Quick Start File the PASS team now provides all the worked examples from the PASS User Guide originally done in Stata as SPSS/PASW code. This includes examples for merging household, individual, spell and weight datsets, as well as using the cross-sectional and longitudinal weights for projections to different populations.

PASS Quick Start File – Analysing the PASS data using SPSS/PASW