Harvard Dataverse FAQ

  1. What is the Harvard Dataverse Network?
  2. What is a Dataverse?
  3. What is a Study?
  4. Why share research data with the Harvard Dataverse Network?
  5. How does the Dataverse Network encourage good archival practices?
  6. Is the data permanently preserved?
  7. Who can deposit data at the Harvard Dataverse Network? 
  8. Is it free for me to deposit data?
  9. Who can access data?
  10. Will I have control of my data? 
  11. How can I control who has access to download or see the data and its metadata?
  12. What do I need to submit data?
  13. Are there any file format requirements?
  14. What best practices can I follow for data preparation?
  15. Can I see if someone downloads/views the data?
  16. How can I get credit for the data if someone uses it?
  17. How do I upload my research data onto the Harvard Dataverse Network?

1. What is the Harvard Dataverse Network?
The Harvard Dataverse Network is a repository for sharing, citing and preserving research data; open to all scientific data from all disciplines worldwide. It includes the world's largest collection of social science research data.

2. What is a Dataverse?
Dataverse: container for research data studies (see Study definition below) that can be customized and managed by its owner.

3. What is a Study?
Study: A container for a research data set. It includes cataloging information, data files and complementary files.

4. Why share research data with the Harvard Dataverse Network?

    • Fulfill data management plan requirements.
    • Get recognition and credit via data citations.
    • Customize your Dataverse using branding or embedding.
    • Allow collaborators to contribute to your Dataverse.
    • Restrict data to your project team until ready for public release.
    • Facilitate the discovery and reuse of your data through extensive cataloging.
    • Enables reproducible research, which contributes to the validation and verification of science.



      For more information please refer to our Features page.

5. How does the Dataverse Network encourage long term preservation & good archival practices?
Long term preservation and good archival features include:

  • Converting tabular data sets to "preservation" format (plain txt file), independent of the original statistical package (which some times is proprietary) so it can be accessed in 20-30-50 years from now. (Note: this could be extended to other data types, but it's an important feature for many of the datasets).
  • Exporting metadata (both descriptive and technical) to preservation formats.
  • Standard XML metadata (using our export mechanism, used for harvesting support).
  • Support for DOI - will resolve permanently to a landing page of the dataset using a global persistent identifier, independent of the software or server the dataset is hosted in the future.
  • LOCKSS -support for keeping replications of the data and metadata in multiple locations, compliant with good archival practices. If you are interested in being part of DATA-Pass as a partner to replicate the data in multiple partner locations using LOCKSS, you can learn more about DATA-pass here (note that main focus is on social science): http://www.data-pass.org/
  • Versioning and deaccession compliant with good archival practice.

6. Is the data permanently stored?
Yes, from our collaboration with with Harvard University Library the data stored in the Harvard Dataverse Network are stored permanently.

7. Who can deposit data at the Harvard Dataverse Network?
Any researcher worldwide (faculty, postdoc, student, or staff) can use the Harvard Dataverse Network to archive, find and share research data sets. However, check with your institution or organization regarding any additional restrictions on data access.

8. Is it free for me to deposit data?
Yes, depositing research data on the Harvard Dataverse Network is free* for anyone within or outside of Harvard. 
If you plan to upload more than 1TB of data please contact us.

9. Who can access data?
If the cataloging information and/or datasets are made publicly available then any member of the public may discover these datasets or request access to them. Access to the data and cataloging information (descriptive metadata) are made available in accordance with the terms specified by the data depositor.

10. Will I have control of my data?
Yes, you will have the ability to control access to your data and associated Cataloging information for your studies (see below). Once data is deposited, you will also have control over the versioning of the data set and its associated metadata.

11. How can I control who has access to download or see the data and its metadata?
Under the “Dataverse File Permission Settings” you can choose 'Yes' to restrict ALL files in your dataverse. To restrict files individually, go to the Study Permissions page of the study containing the file. You can also grant permission to specific people or groups to access your data. For more information see our Dataverse User Guides for Managing Permissions.

12. What do I need to submit data?

  • Permission from the appropriate PI(s)

  • De-identified dataset, and if available, documentation files to support the data (e.g. ReadMe file(s), codebooks, etc)

  • Information about your dataset for the Cataloging Information page. This metadata helps make your research discoverable so that others can find and cite your research easily. The more information you can provide the better!

13. Are there any file format requirements?
All file formats are supported by the Dataverse Network with a maximum size of 2GB per file. However, please ensure that any files of a specialized or proprietary nature are accompanied by any pertinent information that would allow the proper viewing and/or usage of the file. This information could be stored in a separate ‘Readme’ file within the data set files.

The Dataverse Network offers additional support for:

    • Subset and analysis for tabular datasets: Files uploaded in SPSS, STATA and R offer additional subsetting and analysis services, and can be downloaded in multiple formats.
    • Subset for social network data: Files uploaded in GraphML offer additional subsetting and network measurements. See example of a dataverse with graph data files.
    • Metadata extraction for searching of FITS files.

14. What best practices can I follow for data preparation?
See our replication guidelines for detailed best practices on how to prepare your data to be deposited into the Dataverse Network. Additionally, for tabular datasets, you may also find this paper useful, "Nine simple ways to make it easier to (re)use your data" by Ethan P. White, Elita Baldridge, Zachary T. Brym, et al.

15. Can I see if someone downloads/views the data?
For how to obtain the number of downloads/views per dataset please refer to our Dataverse User Guides section "Download Tracking Data".

16. How can I get credit for the data if someone uses it?
Data re-users are provided with a data citation at your Study’s Cataloging information page which provides the proper data citation (including persistent identifier and permalink). Your data Terms of Use agreement contains a term requiring data re-users to provide proper attribution, thus, they must cite your data when and where it is appropriate.

Excerpt from Dataverse Network Data Terms of Use:

“I agree that any books, articles, conference papers, theses, dissertations, reports, or other publications that I create which employ data reference the bibliographic citation accompanying this data. These citations include the data authors, data identifier, and other information accord with the Recommended Standard (http://thedata.org/book/standard).”

Screenshot of an example of a Data Citation on the Dataverse Network Cataloging Information page:
Data Citation Screenshot with DOI


17. How do I upload my research data onto the Harvard Dataverse Network?
For how to upload your research data onto the Harvard Dataverse Network please visit our visual guide to Getting Started with the Harvard Dataverse Network.