Additional Information
These supplements support version 1.3 of the Dataverse Network software.
Supplemental information available about dataverses and Dataverse Networks includes the following:
Glossary
- collection
- A collection is a way to group or categorize a set of studies. A dataverse can have a tree of collections and sub-collections. A study can belong to multiple collections. Collections can be defined as a query or as an association of specific studies. When a collection is defined as a query, any new study added that satisfies that query (for example, author: Smith) is added automatically to the collection. The root collection of a dataverse by default contains all studies that are owned by that dataverse. It is defined as an association. A dataverse also can link to an entire collection tree from another dataverse.
- customization
- You can customize the following components of your dataverse: name and alias, banner and footer (to have the style of your website), homepage layout, and Contact Us e-mail address. You also can set additional fields to be displayed in Search results.
- dataverse
- A Dataverse Network can contain multiple dataverses. Each dataverse is a virtual archive or organizer of research data. It can contain data sets (studies) uploaded specifically to that dataverse, or data sets that belong to other dataverses. The data sets can be organized by collections and sub-collections. In addition to uploading your own studies and setting up your own collections, you can customize a dataverse in the following ways:
- Modify the banner and footer.
- Choose to display announcements or descriptions on the homepage.
- Choose to display a subset of the most recent studies uploaded to the dataverse.
- Set up a description of the dataverse in the About page.
- Set up a Contact Us e-mail so users can send messages to the dataverse administrator.
- restricted versus public
- When a dataverse is first created is set as restricted. A restricted dataverse cannot be access by any users unless they are admins, curators or contributors of that dataverse, or they are granted special permission to access it. When a user browses or searches a dataverse network, all studies and collections in a restricted dataverse are ommitted from any view or search results.
- study
- A study is a logical grouping of one or more data sets. A study contains cataloging information. Only Title and ID information is required in the catalog; however, there are nearly 100 cataloging fields available to specify a study, including details for authors, producers and distributors, the scope of the study, the methodology used, and more. A study typically includes a set of electronic files. Some files might be documentation related to the study and other files might be data.
- study fields
- Study fields is another name for the citation fields that appear on a study's Cataloging Information page.
- study files
- The Files page of a study lists all electronic files associated with that study, and are provided by the author or curator. The study might contain documentation files and data files. In the Dataverse Network, data files (sometimes called subsettable files) are files that you can subset and analyze online by using the Dataverse Network tools. You can differentiate a data file from other files because an analysis icon and the number of variables and categories are displayed next to the file. Other files might also contain data, but the Dataverse Netwok application does not recognize them as data (subsettable) files.
- subsettable
- The Dataverse Network currently treats STATA (
.dta) and SPSS (.sav or .por) formatted data files as subsettable. When a file is subsettable, you can analyze it online or download a subset (selection) of the variables in the file. You then can recode the variables and apply descriptive statistics, or use any of the models provided by the Zelig statistical package. See Enter Catalog Information and Upload Study Files for more information.
List of Metadata
The Dataverse Network metadata is compliant with the DDI schema version 2. The Cataloging Information fields associated with each study contain most of the fields in the study description section of the DDI. That way the DVN metadata can be mapped easily to a DDI, and be exported into XML format for preservation and interoperability.
DVN data also is compliant with Simple Dublin Core (DC) requirements. For imports only, DVN data is compliant with the Content Standard for Digital Geospatial Metadata (CSDGM), Vers. 2 (FGDC-STD-001-1998) (FGDC).
Attached is a PDF file that defines and maps all DVN Cataloging Information fields. Information provided in the file includes the following:
- Field label - For each Cataloging Information field, the field label appears first in the mapping matrix.
- Description - A description of each field follows the field label.
- Query term - If a field is available for use in building a query, the term to use for that field is listed.
- DVN database element name - The Dataverse Network database element name for the field is provided.
- Advanced search - If a field is available for use in an advanced search, that is indicated.
- DDI element mapping for imports - For harvested or imported studies, the imported DDI elements are mapped to DVN fields.
- DDI element mapping for exports - When a study or dataverse is harvested or exported in DDI format, the DVN fields are mapped to DDI elements.
- DC element mapping for imports - For harvested or imported studies, the imported DC elements are mapped to specific DVN fields.
- DC element mapping for exports - When a study or dataverse is harvested or exported in DC format, specific DVN fields are mapped to the DC elements.
- FGDC element mapping for imports - For harvested or imported studies, the imported FGDC elements are mapped to specific DVN fields.
Zelig Interface Schema
Zelig is statistical software for everyone: researchers, instructors, and students. It is a front-end and back-end for R (Zelig is written in R). The Zellig software:
- Unifies diverse theories of inference
- Unifies different statistical models and notation
- Unifies R packages in a common syntax
Zelig is distributed under the GNU General Public License, Version 2. After installation, the source code is located in your R library directory. You can download a tarball of the latest Zelig source code from http://gking.harvard.edu/src/contrib/.
The Dataverse Network software uses Zelig to perform advanced statistical analysis functions. The current interface schema used by the DVN for Zelig processes is in the following location:
http://thedata.org/files/thedata/schema/ZeligInterfaceDefinition_1_1.xsd
Three factors determine which Zelig models are available for analysis in the DVN:
- Some new models require data structures and modeling parameters that are not compatible with the current framework of the DVN and other web-driven applications. These types of models are not available in the DVN.
- Models must be explicitly listed in the Zelig packages to be used in the DVN, and all models must be disclosed fully, including runtime errors. Zelig models that do not meet these specifications are excluded from the DVN until they are disclosed with a complete set of information.
- An installation-based factor also can limit the Zelig models available in the DVN. A minimum version of the core software package GCC 4.0 must be installed on any Linux OS-based R machine used with the DVN, to install and run a key Zelig package, MCMCpack. If a Linux machine that is designated to R is used for DSB services and does not have the minimum version of the GCC package installed, the DVN looses at least eight models from the available advanced analysis models.