The Dataverse Network Project standardizes the citation of data sets. Until this Project, citations of data were inconsistent or nonexistent in many publications, with future access and scholarly recognition highly uncertain. When you contribute a study to the Dataverse Network, the citation is calculated and presented automatically.
See the following for detailed information about how the Project implements citations:
The citation standard defined here offers proper recognition to authors as well as permanent identification through the use of global, persistent identifiers in place of URLs, which can change frequently. Use of universal numerical fingerprints (UNFs) guarantees to the scholarly community that future researchers will be able to verify that data retrieved is identical to that used in a publication decades earlier, even if it has changed storage media, operating systems, hardware, and statistical program format.
Following is an authentic example of a replication data-set citation (from International Studies Quarterly, King and Zeng, 2007: PDF, p.209):
Gary King; Langche Zeng, 2006, "Replication Data Set for 'When Can History be Our Guide? The Pitfalls of Counterfactual Inference'" hdl:1902.1/DXRXCFAWPK UNF:3:DaYlT6QSX9r0D50ye+tXpA== Murray Research Archive [distributor]
This citation has six components. Three are readable by humans: the author, title and year. Two components are machine-readable, and one is optional. Of the machine-readable components to this citation, the unique global identifier begins with "hdl" (this refers to the international handle system). The universal numerical fingerprint begins with "UNF". This identifier is designed to persist even if URLs--or the web itself--are replaced with something else. When the citation appears online, the identifier is hot-linked to the URL that references the identifier, which works in browsers available today. In print, the URL is also included in the citation.
Four features make the UNF especially useful:
Citations also can have optional features in a standard format, such as "Murray Research Archive [distributor]", which lists a network type in square brackets that is selected from a given, controlled vocabulary.
Learn more: Micah Altman and Gary King. 2007. "A Proposed Standard for the Scholarly Citation of Quantitative Data," D-Lib Magazine, Vol. 13, No. 3/4 (March). (Abstract: HTML | Article: PDF)
The UNF portion of the citation standard for data sets uses a specific algorithm to compute the approximated semantic content of a digital object. This approximated content is then put into a normalized (or canonicalized) form, and a hash function is used to compute a unique fingerprint for the resulting normalized, approximated object. The resulting hash (a string of characters) is thus independent of the storage medium and format of the object.
Version 3 of the UNF algorithm currently is used by the Dataverse Network Project. This algorithm can be used on digital objects containing vectors of numbers, vectors of character strings, data sets comprising such vectors, and studies comprising one or more such data sets. Version 4 has better security at the cost of a longer UNF.
The UNF algorithm applied to the content of a data set or study is as follows:
If an element is an IEEE 754, nonfinite, floating-point special value, represent it as the signed, lowercase, IEEE minimal printable equivalent (that is, +inf,-inf, or +nan).
Each character string comprises the following:
For example, the number pi at five digits is represented as -3.1415e+, and the number 300 is represented as the string +3.e+2.
Learn more: Software for computing UNFs is available in an R Module, which includes a Windows standalone tool and code for Stata and SAS languages. See also Micah Altman and Gary King. 2007. "A Proposed Standard for the Scholarly Citation of Quantitative Data," D-Lib Magazine, Vol. 13, No. 3/4 (March). (Abstract: HTML | Article: PDF); Micah Altman, Jeff Gill and Michael McDonald, 2003, Numerical Issues in Statistical Computing for the Social Scientist, New York: John Wiley (Web site); and Micah Altman, Jeff Gill, and Michael McDonald, "R Modules for Accurate and Reliable Computing," UseR! 2006 (PDF).