Data Users Guide

This guide supports version 1.2 of the Dataverse Network application.

You start viewing and using data on the Datverse Network homepage. These topics describe how to use the Dataverse Network:

The Dataverse Network Homepage

When you log on to the Dataverse Network, you see the Network homepage. This homepage has two tabs: Now Available and Coming Soon.

Any user can browse, search, and download files from dataverses on the Now Available tab. These dataverses are designated as released. If a study contains data files that are subsettable, you can subset and analyze those data files, and then download the subsetted data. This tab is the default view of the Network homepage.

Dataverses listed on the Coming Soon tab are designated as not released, and the contents are not available to the public. You cannot search or download studies in these dataverses.

Continue to the next topic to see how to navigate the menus.

Main Menu Options

All users can perform the following by using the main menu options:

  • Click Search/Browse on a Network page to find studies and data in the Network. You see the Now Available tab listing public dataverses.
    Click Search/Browse on a dataverse page to find studies and data in that dataverse. You see the dataverse homepage listing public studies in that dataverse.
    Click Advanced Search to define your search in greater detail. You see the Advanced Search page.
  • Click User Guides to open a new browser window and start reading our online guides.
  • Click Site Map to view all options available to you, listed in a text tree.
  • Click Contact Us on the Network homepage to fill in a form and submit a question or comment to the Network admin.
    Click Contact Us on a dataverse web page to submit a question or comment to that dataverse's admin.
  • Click Log in to access your privileged options. Click Log out to return to the unprivileged state.
  • Click <Your Username> when logged in to access your account settings.

Note: A login timeout period exists. After you log in, if you do not use the interface for a short period of time, the Network might prompt you to log in again when you do start using it.

Dataverses, Studies, and Data

The Dataverse Network hosts many distinct types of dataverses. Each dataverse can contain any number of studies, and each study can contain any number of data set files. Within a dataverse, studies can be organized into collections. A dataverse also can contain links to collections of studies in other dataverses.

You can browse, search, and view contents of released dataverses within the Network, or of any dataverse or study that you have permission to access. To view the contents of a dataverse, study, or collection, click the title that you choose to view. The contents are expanded and displayed below the selected entity.

Authors and administrators control access to content by using dataverse, study, and file settings and Terms of Use. For example, you can view data files for any study that is set as Released and is available to the public, or for any study that you have permission to access.

For each study you can view two distinct sets of information: the Cataloging Information and the list of study files.

Settings and Terms

Status and permission settings can apply independently to the following: Dataverses, Studies, and Study files

In addition, Terms of Use can apply at any level: Network, dataverse, or study.

Dataverse Settings

When a dataverse is created, it is set as not released. A dataverse that is not released can be viewed or accessed only by a Dataverse Admin, Curator, or Contributor for that dataverse, or by any user granted permission explicitly to access it. It appears on the Coming Soon tab in the Dataverse Network. When work on the dataverse is complete and the site is set to released, it then appears on the Now Available tab in the Dataverse Network. Any user can view or access the dataverse after it is released.

Permissions can be granted to access a dataverse that is not released.

Note: When a user browses or searches a Dataverse Network, all studies and collections in a dataverse that is not released are omitted from any list of search results.

Study and File Settings

Study Settings

A study is created within a dataverse, and the initial status of that study is New. When the Contributor (author) determines that work on the study is complete, that user changes the status to In Review. A Curator or Dataverse Admin then reviews the study and changes the status to Released. Any user can view or access that study.

For each study within a dataverse, permissions can be set to Restricted or Public for access to that study. Permissions can be granted to specific users or groups to access Restricted studies, and a setting can be enabled to enable users to request access to Restricted studies.

File Settings

For each file within a study, permissions can be set to Restricted or Public for access to that file. Permissions can be granted to specific users or groups to access Restricted files.

Terms of Use

Dataverses, studies, and data files can have user restrictions applied. If prompted to accept Terms of Use, click the check box and then click the Continue button to view or download the information you chose.

Terms of Use are customizable and can apply at any of three levels:

  • Network level - Terms of Use can apply to general use of login accounts, to study creation and data uploads, or to study use and data downloads.
  • Dataverse level - Terms of Use can apply to study creation and data uploads, or to study use and data downloads.
  • Study level - Terms of Use can apply to use of individual studies.

If a Network has Terms of Use applied to general use, then each time you log in to the Network you must accept the Terms before you can access any options.

If a Network has Terms of Use applied to data uploads or downloads, then any study in that Network includes those Terms. When you select a study from a Network with Terms applied to data downloads, you accept the Terms to download the files. You also must accept Terms when you add a study or file to a dataverse in a Network with Terms applied to study creation.

If a dataverse has Terms of Use applied to data uploads or downloads, then any study in that dataverse includes those Terms. You must accept the Terms to add or download files.

When Terms of Use are applied at the Network or dataverse level, for the first study file that you view or add in the Network or dataverse you must accept the Terms one time per session in the Network.

Individual studies also can have Terms of Use applied. When you select a study with Terms applied and view the Cataloging Information tab, a Terms of Use section appears at the bottom of the tab. Click the blue down-arrow to view the Terms of Use on this tab. If you choose to view, subset, or download any study files or data sets from a study with Terms applied, first you must accept the Terms.

When Terms are applied at the study level, for the first file that you view or download in the study you must accept the Terms one time per session in the Network.

Cataloging Information (Citation Fields)

When a study is created, a set of metadata is associated with that study. This metadata is called the Cataloging Information for the study. When you select a study to view it, you first see the Cataloging Information tab listing the metadata associated with that study. This is the default view of a study.

Cataloging Information contains numerous fields that help to describe the study. The amount of information you find for each study varies, based on what was entered by the author (Contributor) or Curator of that study. For example, one study might display the distributor, related material, and geographic coverage. Another study might display only the authors and the abstract. Every study includes the Citation Information fields in the Cataloging Information.

Note: A comprehensive list of all Cataloging Information fields is provided in the List of Metadata.

Cataloging Information is divided into four sections. These sections and their details are displayed only when the author (Contributor) or Curator provides the information when creating the study. Sections consist of the following:

  • Citation Information - These fields comprise the citation for the study, consisting of a global identifier for all studies and a UNF, or Universal Numerical Fingerprint, for studies that contain subsettable data files. It also can include information about authors, producers and distributors, and references to related studies or papers.
  • Abstract and Scope - This section describes the research study, lists the study's data sets, and defines the study's geographical scope.
  • Data Collection/Methodology - This section includes the technical details of how the author obtained the data.
  • Terms of Use - This information explains that the study requires users to accept a set of conditions or agreements before downloading or analyzing the data. If any Terms of Use text is displayed in the Cataloging Information section, you are prompted to accept the conditions when you click the download or analyze icons in the Files page.
    Note: A study might not contain Terms of Use, but in some cases the original parent dataverse might have set conditions for all studies owned by that dataverse. In that case, the conditions are inherited by the study and you must accept these conditions before downloading files or analyzing the data.

List of Study Files

When you view a study, click the Documentation, Data and Analysis tab to view a list of all electronic files associated with the study that were provided by the author or Curator. See the Legend at the bottom of the Documentation, Data and Analysis tab to interpret any icons associated with these files.

A study might contain documentation files and data files. When you upload data files of the type .dta, .sav, or .por to the Network, they are converted to .tab tab-delimited files. These .tab files are subsettable, and can be subsetted and analyzed online by using the Dataverse Network application.

You can identify a subsettable data file by the analysis icon and the number of variables and categories listed next to the file name. Other files that also contain data might be associated with a study, but the Dataverse Network application does not recognize them as data (or subsettable) files.

Browse and Search a Dataverse

To find a study or data set, you can search or browse studies offered in any released dataverse on the Now Available tab. Each dataverse offers a hierarchical organization comprising one or more collections of data sets with a particular theme. Most dataverses allow you to search for data within their files, or you can start browsing at the dataverse closest to your substantive interests.

Dataverses are served by DVNs. To view a live DVN installation, go to the IQSS Dataverse Network and browse or search our dataverses.

Keep reading to find out more about these subjects:

Browse Collections

You can browse all public dataverses from the Network homepage Now Available tab. Click the title of a dataverse to browse that dataverse's collections and studies. Click the title of a collection to view a list of studies and subcollections for that selection. Click the title of a study to view the Cataloging Information and study files for that selection.

When you select a dataverse to view its contents, the homepage opens to the root collection, and the dataverse's studies are displayed directly under the root collection name. If the root collection contains other collections, then those collections are listed and not the studies within them. You must select a collection title to view the studies contained within it.

Note: If a dataverse includes links to collections from another dataverse and the root collection does not contain other collections, the homepage opens to a list of the root and linked collections.

Search - Basic

You can search for studies across the entire DVN from the Network homepage, or search within a dataverse from the dataverse homepage. When you search across the Network, studies from restricted dataverses are not included in the search. If an entire study is restricted (both metadata and files), it is not included in search results unless you have access to that data. After your search is complete, you can further narrow your list of data by searching again in the results. See Search Tips for search examples and guidelines.

When you enter more than one term in the search text field, the results list contains studies that have these terms near each other within the study fields searched. For example, if you enter United Nations, the results include studies where the words United and Nations are separated by no more than four words in the same study field, such as abstract or title.

You can restrict a search to content in the following study fields by using the basic Search drop-down list:

  • Cataloging Information - This is the default field to search. It supports a search in any field of the studies' Cataloging Information, which includes citation information, abstract and other scope-related information, methodology, and Terms of Use.
  • Title - This option searches only the title field of studies.
  • Author - This option searches only the author fields of studies.
  • Study ID - This option searches the ID field of studies, without including the handle (hdl) and the authority (1902.X) values.
  • Variable Information - This option searches the variable name and description fields in the studies' data files, given that a data file is subsettable. Results of a search using this field lists the studies with the file and the variable name in which the search term was found.

Search Tips

Use the following guidelines to search effectively within a Network or a dataverse:

  • The default search syntax uses AND logic within individual fields. That is, if you enter more than one term, the search engine looks for all terms within a single field, such as title or abstract.
    For example, if you enter United Nations report, the results list any studies that include the terms United, Nations, and report within a single metadata field.
  • The search logic looks for multiple terms within a specific proximity to one another, and in the same field. The current proximity criteria is four words. That is, if you enter two search terms, both terms must be within four words of each other in the same field to be returned as a result.
    For example, you might enter 10 year in a basic search. If a study includes the string 10 millions deaths per year within a metadata field, such as abstract, that study is not included in the search results. A study that contains the string 10 per year within the abstract field is included in the search results.
  • You can enter one term in the search field, and then search within those results for another term to narrow the results further. This might be more effective than searching for both terms at one time, if those terms do not meet the proximity and field limits specified previously.
    You could first search for an author's name, and then search those results for a specific term in the title. If you try searching for both terms in the author and title fields together, you might not find the study for which you are looking.
    For example, you can search the IQSS DVN for the following study:

    Gary King; Will Lowe, 2003, "10 Million International Dyadic Events", hdl:1902.1/FYXLAWZRIA UNF:3:um06qkr/1tAwpS4roUqAiw== Murray Research Archive [Distributor]

    If you type King, 10 Million in the Search field and click Search, you see 0 matches were found in the Results field. If you type 10 in the Search field and click Search, you see something like 1621 matches were found in the Results field. But if you first type King in the Search field and click Search, then type 10 Million in the Search field and click Search again, you see something like 4 matches were found in the Results field.

Search - Advanced

In an advanced search, you can refine your criteria by choosing which Cataloging Information fields to search. You also can apply logic to the field search. For text fields, you can specify that the field searched either contains or does not contain the text that you enter. For date fields, you can specify that the field searched is either later than or earlier than the date that you enter. Refer to the Documentation page for Query Syntax at the Lucene web site for full syntax details.

To perform an advanced search, click the Advanced Search link at the top-right of the Search panel. You can search the following study metadata fields by using the Search Scope drop-down list:

  • Title - Title field of studies' Cataloging Information.
  • Author - Author fields of studies' Cataloging Information.
  • Study ID - ID assigned to studies.
  • Other ID - A different ID previously given to the study by another archive.
  • Abstract - Any words in the abstract of the study.
  • Keyword - A term that defines the nature or scope of a study. For example, elections.
  • Keyword Vocabulary - Reference to the standard used to define the keywords.
  • Topic Classification - One or more words that help to categorize the study.
  • Topic Classification Vocabulary - Reference used to define the Topic Classifications.
  • Producer - Institution, group, or person who produced the study.
  • Distributor - Institution that is responsible for distributing the study.
  • Funding Agency - Agency that funded the study.
  • Production Date - Date on which the study was created or completed.
  • Distribution Date - Date on which the study was distributed to the public.
  • Date of Deposit - Date on which the study was uploaded to the Network.
  • Time Period Cover Start - The beginning of the period covered by the study.
  • Time Period Cover End - The end of the period covered by the study.
  • Country/Nation - The country or countries where the study took place.
  • Geographic Coverage - The geographical area covered by the study. For example, North America.
  • Geographic Unit - The smallest geographic unit in which the study took place, such as state.
  • Universe - Universe of interest, population of interest, or target population.
  • Kind of Data - The type of data included in the file, such as survey data, census/enumeration data, or aggregate data.
  • Variable Information - The variable name and description in the studies' data files, given that the data file is subsettable. It returns the studies that contain the file and the variable name where the search term was found.

Sort Results

When your search is complete, the results page lists studies that met the search criteria in order of relevance. For example, a study that includes your search term within the Cataloging Information in ten places appears before a study that includes your search term in the Cataloging Information in only one place.

You can sort search results by title, study ID, or number of downloads (that is, the number of times users downloaded any file belonging to that study). Click the Sort By drop-down list to choose your sort order.

When you browse a collection, the studies contained within the collection are listed alphabetically by title.

Download Study Files

You can download any of the following within a study:

The default format for all subsettable data file downloads is tab-delimited. When you download one or more subsettable files in tab-delimited format, the file contains a header row. When you download one subsettable file, you can select from the following formats in addition to tab-delimited:

Note: Studies and data files often have user restrictions applied. If prompted to accept Terms of Use for a study or file, check the I Accept box and then click the Continue button to view or download the file.

Download All Files in a Study

If you download all data files within a study, the files are downloaded in a zipped archive, and the individual files are in tab-delimited format. You must unzip the archive to view or use the individual, tab-delimited data files.

To download all data sets associated with a study:

  1. Go to the Documentation, Data and Analysis tab for the study.
  2. Click the download icon at the top of the list of files.
  3. Follow your browser's prompts to open or save the zipped archive of all study files to your computer's disk drive.

Download All Files in a Category

When files are uploaded to a study, the Contributor assigns a category to the file: Documentation or Data File.

If you download all data files within a category, the files are downloaded in a zipped archive, and the individual files are in tab-delimited format. You must unzip the archive to view or use the individual, tab-delimited data files.

To download all data sets within a category:

  1. Go to the Documentation, Data and Analysis tab for the study.
  2. Click the download icon beside the selected category.
  3. Follow your browser's prompts to open or save the zipped archive of all study files within that category to your computer's disk drive.

Download Individual Files

If you download an individual data file, you can select from several file formats in which to download the data. If you select tab-delimited format, the file is downloaded directly. If you select any other format, the file is downloaded in a zipped archive. You must unzip the archive to view or use the individual data file.

To download one full data set without subsetting or analyzing the contents:

  1. Go to the Documentation, Data and Analysis tab for the study.
  2. Use the Type pull-down menu to select the downloaded file format.
  3. Click the download icon beside the selected data set file.
  4. Follow your browser's prompts to open or save the data file to your computer's disk drive.

Subset, Analyze, and Download Data Sets

Data files (subsettable files) can be subsetted and analyzed online by using the Dataverse Network application. For analysis, the Dataverse Network offers a user interface to Zelig, a powerful, R-based statistical computing tool. A comprehensive set of statistical analysis models are provided.

After you find the data set that you want, access the Subset and Analysis options to use the online tools. Then, you can subset data by variables or observations, translate it into a convenient format, download subsets, and apply statistics and analysis.

Review the Data Subset and Recode Tips before you start.

Statistical Analysis Models

You can apply any of the following advanced statistical models to all or some variables in a data set:

  • Descriptive statistics: Univariate numeric or graphic summaries
  • Categorical data analysis: Cross tabulation
  • Event count models, for event count dependent variables:
    • Negative binomial regression
    • Social network Poisson regression
    • Poisson regression
  • Models for continuous bounded dependent variables:
    • Exponential regression for duration
    • Gamma regression for continuous positives
    • Log-normal regression for duration
    • Social network gamma regression for continuous positives
    • Weibull regression for duration
  • Models for continuous dependent variables:
    • Least squares regression
    • Social network least-squares regression
    • Social network normal regression
    • Linear regression for left-censoreds
  • Models for dichotomous dependent variables:
    • Logistic regression
    • Social network complementary log-log regression
    • Social network logistic regression
    • Social network probit regression
    • Probit regression
    • Rare events regression
  • Models for ordinal dependent variables:
    • Ordinal logistic regression for ordered categoricals
    • Ordinal probit regression for ordered categoricals

Access Subset and Analysis Options

You can subset and analyze data files before you download the file or your subsets.

To access the Subset and Analysis options for a data set:

  1. Click the title of the study from which you choose to analyze or download a file or subset.
  2. Click the Documentation, Data and Analysis tab for the study.
  3. In the list of study files, locate the data file that you choose to download, subset, or analyze.
    You can download data sets for a file only if the file entry includes the subset icon.
  4. Click the subset icon associated with the selected file.
    If prompted, check the I accept box and click Continue to accept the Terms of Use, and then click the subset icon again.

Subset or Recode Data

Review the Data Subset and Recode Tips before you start work with a study's files.

To subset and recode variables within a data set:

  1. In the Subset and Analysis page, click the Subset and Recode tab.
  2. From the Show drop-down list, select one of the following options to show variables in redefined quantities: All, 50, 20, or 10.
  3. Scroll down the screen and click the check boxes to select variables from the table of available values. When you select a variable, it is added to the Selected Variables box at the top of the tab.
    To remove a variable from this box, deselect it from the Variable Type list at the bottom of the screen.
    To select all variables, click the check box beside the column name, Variable Type.
  4. Select one variable in the Selected Variables box, and then click the right Arrow button.
    These name of the variable appears in the New Variable Name and New Variable Label boxes.
  5. In the New Variable Label field, change the variable name to a unique value that is not used in the data file.
    The new variable label is optional and you can leave it blank.
  6. In the table below the Variable Name fields, you can check one or more values to drop them from the subset, or enter new values or ranges (as a condition) as needed. Click the Add Value/Range button to create more entries in the value table.
    (See Data Subset and Recode Tips for more information about adding values and ranges.)
  7. Click the Apply Recodes button.
    Your renamed variables appear in the Selected Variables box.
    Note: If you enter a variable name that is already in use, you see the message The variable Name you entered is found among the existing variables; enter a new variable name.
  8. Select another variable in the Selected Variables box, click the right Arrow button, and repeat the recode action.
    Repeat this process for each variable that you choose to recode.

Continue to download a subset.

Data Subset and Recode Tips

Use the following guidelines when working with data files:

  • Subsetting:
    • If the variable you chose for subsetting has information about its value-labels, you can prefill the table with these data for convenience.
    • To exclude a value in the last column of the table, click the check box in the same row.
    • To include a particular value or range, enter it in the last column whose header shows the name of the variable for subsetting.
  • Recoding:
    • You must fill at least the first (new value) and last (condition) columns of the table; the second column is optional and for a new value label.
    • If the old variable you chose for recoding has information about its value-labels, you can prefill the table with these data for convenience, and then modify these prefilled data.
    • To exclude a value from your recoding scheme, click the check box in the same row.
  • Entering a value or range as a condition for subsetting or recoding:
    • Suppose the variable you chose for recoding is x.
      If your condition is x==3, enter 3.
      If your condition is x < -3, enter (--3.
      If your condition is x > -3, enter -3-).
      If your condition is -3 < x < 3, enter (-3, 3).
    • Use square brackets ([]) for closed ranges.
    • You can enter nonoverlapping values and ranges separated by a comma, such as 0,[7-9].

Download Subsets

You can download a subset of variables within a study file. You also can recode a subset of variables and download the recoded subset, if you choose.

To download a subset of variables:

  1. In the Subset and Analysis page, click the Download Subset tab.
  2. Click the radio button for the appropriate File Format in which to download the variables: Text, R Data, S plus, or Stata.
  3. Click the Show drop-down list to select the quantities of variables to list at one time: All, 50, 20, or 10.
  4. Scroll down the screen and click the check boxes to select variables from the table of available values. When you select a variable, it is added to the Selected Variables box at the top of the tab.
    To remove a variable from this box, deselect it from the Variable Type list at the bottom of the screen.
    To select all variables, click the check box beside the column name, Variable Type.
  5. Click the Download button. If prompted, check the I accept box and then click the Continue button to accept the Terms of Use. Then, click Download again.
  6. Follow your browser's prompt to open or save the data file to your computer's disk drive.

Apply Descriptive Statistics

To apply descriptive statistics to a data set or subset:

  1. In the Subset and Analysis page, click the Descriptive Statistics tab.
  2. Click one or both of the Descriptive Statistics options: Univariate Numeric Summaries and Univariate Graphic Summaries.
  3. From the Show drop-down list, select one of the following options to show variables in predefined quantities: All, 50, 20, or 10.
  4. Scroll down the screen and click the check boxes to select variables from the table of available values. When you select a variable, it is added to the Selected Variables box at the top of the tab.
    To remove a variable from this box, deselect it from the Variable Type list at the bottom of the screen.
    To select all variables, click the check box beside the column name, Variable Type.
  5. Click the Run Statistics button.
    If prompted, check the I accept box and then click the Continue button to accept the Terms of Use. Then, click Run Statistics again.
    You see the Dataverse Analysis page.
  6. Under Citation Information about the data set, click Citation Info to display various methods of citation for the study's replication data.
    Under Results, click Descriptive Statistics to check each model's estimation results and descriptive statistics (if applicable). You also can click Show log to view the R log file.
    Click the e-mail link to contact the Dataverse Admin about these results.
    You can retain these details by copying and pasting the text into another document file or saving the page.
  7. Click Go back to the previous page, at the top of the screen.

Perform Advanced Analysis

To run statistical models for selected variables:

  1. In the Subset and Analysis page, click the Advanced Statistical Analysis tab.
  2. Scroll down the screen and click the check boxes to select variables from the table of available values. When you select a variable, it is added to the Selected Variables box at the top of the tab.
    To remove a variable from this box, deselect it from the Variable Type list at the bottom of the screen.
    To select all variables, click the check box beside the column name, Variable Type.
  3. Select a model from the Choose a Statistical Model drop-down list.
  4. Select one variable in the Selected Variables box, and then click the applicable arrow button to assign a function within that analysis model to that variable.
    You see the name of the variables in the appropriate function box.
    Note: Some functions allow a specific type of variable only, while other functions allow multiple variable types. Types include Character, Continuous, and Discrete. If you assign an incorrect variable type to a function, you see an Incompatible type error message.
  5. Repeat the variable and function assignments until your model is complete.
  6. Select your Output and Analysis options.
  7. Click the Run Model button.
    If prompted, check the I accept box and then click the Continue button to accept the Terms of Use. Then, click Run Model again.
    You see the Dataverse Analysis page.
  8. Under Citation Information about the data set, click Citation Info to display various methods of citation for the study's replication data.
    Under Results, click Descriptive Statistics to check each model's estimation results and descriptive statistics, if applicable. You also can click Show log to view the R log file, and click the e-mail link to contact the Dataverse Admin about these results.
    To retain these details, copy and paste the text into a document file or save the page.
  9. Click the Go back to the previous page link at the top of the screen.