Decentralized Identifiers for Research Data

Decentralized Identifiers (DID) enable verifiable, decentralized, digital identification that is independent from “centralized registries, identity providers, and certificate authorities” (W3C). Within the realm of research data management, they could become relevant in two different areas. One is as a form of PIDs and the other is in connection to self-sovereign identity solutions.

DID format                     did:example:123456789abcdefghi

did: – Schema
example: – DID Method
123456789abcdefghi – DID Method specific identifier

DIDs as persistent identifiers

Persistent identifiers (PIDs) are common within research data management. They appear e.g. as Handle, DOI or URN identifiers to globally uniquely describe and persistently identify publications or datasets. They are also used to ascertain unambiguously researchers and organizations through e.g. ORCID-IDs or ROR IDs. All of these forms of PIDs combated the previous problem connected to URLs in representing data independently of its localization. However, they are still dependent on central databases under specific authorities. The organizations or companies, which provide the infrastructure to resolve the PIDs, could encounter at any point difficulties e.g. in maintaining their servers, in acquiring further funding, or in preventing criminal activities. If they cannot provide their services any longer or securely, the PIDs would become useless or even harmful.

The decentralized nature of DIDs is supposed to circumvent precisely this problem. The DID is primarily a URI that represents a so called “DID Subject”. Similar to PIDs, these DID Subjects could be persons, organizations, things or even concepts. In contrast to PIDs, DIDs do not resolve to URLs that are mapped by central web-based resolving systems and databases, but to a “DID Document”. The DID Document contains all metadata associated with the DID and is typically written in JSON-LD. Additionally, there are obligatory components such as “public keys, service endpoints, time stamps, digital signatures, cryptographic verification methods and authentication methods” (Gabriel 31). How a DID can be created, read, updated, deactivated or generally communicated with depends on the “DID Method”. The DID method therefore also defines in what kind of verifiable data registry the DID is recorded. These methods can be based on various different technologies e.g. “distributed ledgers, decentralized file systems, databases of any kind, peer-to-peer networks, and other forms of trusted data storage.” (W3C) The reliability and trustworthiness of DIDs heavily depends on the selected method. For resolving a DID Document the Universal Resolver can be used or institutions can create their own DID-Resolver.

DID within self-sovereign identity solutions

The aim of Self-Sovereign Identity (SSI) solutions is to give individuals more control over their data and digital privacy by allowing trustable interactions between two subjects that are independent of third parties such as companies or governments. This is possible through the authentication of data using cryptographic proofs that each individual can generate. SSI solutions thereby rely on trust networks in which Issuers, Verifiers and Holders exchange encrypted data about a certain Subject through verifiable credentials (VCs). While in many cases the Subject and Holder are identical, the Subject could also be another person, organization, animal, thing or concept. A simple example for such a transaction of data is e.g. that a university issues a graduating student a VC verifying their Master’s Degree. The student holds onto this digital certificate in a digital wallet until they need to present proof of their academic title to a new employer. The VC is shared with the employer, who verifies the cryptographic keys of the university as well as the new employee. All participants involved in these interactions use DIDs to unambiguously identify themselves as well as the interactions between them.

In this use case of DIDs they do not only function as static identifiers that help e.g. distinguish one data set from another, but they become important for the interaction of entities surrounding research data. This could be of interest in case of rights given or received with regard to access or data protection.

Conclusion

While the decentralized nature of DIDs removes the dependency from providers, the reliability of the different DID methods is still difficult to assess. Additionally, other factors should be considered, such as the high impact of some cryptographic proof systems of blockchain technology on the environment. Any implementation would, therefore, require significant human and technical resources and there are to our knowledge currently no precedent cases in the area of research data management.


Further Reading

Sporny, M., Longley, D., Sabadello, M., Reed, D., Steele, O., & Allen, C. (2022): Decentralized Identifiers (DIDs) v1.0, www.w3.org, World Wide Web Consortium, https://www.w3.org/TR/2022/REC-did-core-20220719/.

Bach, N. (2021): Dezentrale Identifikatoren (DIDs) – Die nächste PID-Evolution: selbstsouverän, datenschutzfreundlich, dezentral. O-Bib. Das Offene Bibliotheksjournal / Herausgeber VDB, 8(4), pp. 1–20. https://doi.org/10.5282/o-bib/5755.

Gabriel, Vanessa (2022): Persistent-Identifier-Systeme für Forschungsdaten – Eine analytische Vergleichsstudie mit Empfehlungen zur Implementierung an wissenschaftlichen Einrichtungen, Masterarbeit an der Fachhochschule Potsdam, not published.

Preukschat, A. (2021): Self-Sovereign Identity (1st edition), Manning Publications, ISBN-13 ‎978-1617296598.