Universal Numerical Fingerprint

The Universal Numerical Fingerprint (UNF) portion of the citation standard for data sets uses a specific algorithm to compute the approximated semantic content of a digital object. This approximated content is then put into a normalized (or canonicalized) form, and a hash function is used to compute a unique fingerprint for the resulting normalized, approximated object. The resulting hash (a string of characters) is thus independent of the storage medium and format of the object. Version 3 of the UNF algorithm was implemented, using R code, by the Project prior to implementation of Dataverse Network software version 2.0. With the release of Dataverse Network software version 2.0, UNF version 5 is implemented and uses Java code. If a study was created in a dataverse hosted by a Dataverse Network using software prior to version 2.0, the UNF calculations for that study and all subsettable files comply with UNF version 3 standards. After the Dataverse Network on which such studies are hosted is updated to software version 2.0 or later, all new studies and subsettable files contributed to a dataverse in that Network will comply with UNF version 5 standards. If a new subsettable file is uploaded to an existing study for which the UNF was calculated using version 3 standards, the new file's UNF is calculated using version 5 of the standard and a new UNF also is calculated for the study using version 5.

Learn more:

Altman, Micah, and Gary King. 2007. "A Proposed Standard for the Scholarly Citation of Quantitative Data," D-Lib Magazine 13. Copy at http://j.mp/ikyBfV