Open Research Data

What are Open Research Data (ORD)?

Openness is currently one of the major issues in science. This is especially the case for research data. The phrase “open research data” is used to describe data, which are understood to be accessible to everyone in some way. For scientists, this can offer new potentials but also new challenges.

ORD are increasingly a concern for funding organisations. Therefore, there is a need to concretise the understanding of ORD. For the European Commission, “open research data refers to the data underpinning scientific research results that has no restrictions on its access, enabling anyone to access it.”((European Commission (2020): ‘Facts and Figures for Open Research Data’, https://ec.europa.eu/info/research-and-innovation/strategy/goals-research-and-innovation-policy/open-science/open-science-monitor/facts-and-figures-open-research-data_en.)) This is in line with many other descriptions, namely that open research data is freely accessible and can be used under certain conditions.((See for example also the definition at Daniel Dietrich and others (2021): ‘The Open Data Handbook’, https://opendatahandbook.org.)) The Commission also gives four step to make data to open data.

FAIR Principles

The FAIR principles are usually mentioned as one of the central guides to ORD.((Mark D. Wilkinson and others, ‘The FAIR Guiding Principles for Scientific Data Management and Stewardship’, in: Scientific Data, 3.1 (2016), 160018, https://doi.org/10.1038/sdata.2016.18.)) These principles have a wide recognition and are partly already applied. ORD are tightly connected to the FAIR principles as they define openness on a formal and technical level.

Nevertheless, openness should not be taken as constraint. Science is sometimes more complex, so that the FAIR principles cannot always be applied. The European Research Council therefore emphasises that not all research data can be open. Where data raise e.g. privacy or security concerns, more control and limits on data access may be necessary. For the ERC, any restrictions on access towards research data should be explicit and justified, and such data should still be managed in line with the FAIR principles.((ERC Scientific Council, ‘Open Research Data and Data Management Plans: Information for ERC Grantees’, 2019, https://erc.europa.eu/sites/default/files/document/file/ERC_info_document-Open_Research_Data_and_Data_Management_Plans.pdf. For more information about the legal aspects of research data, see i.e. Catherine Doldirina and others, ‘Legal Approaches for Open Access to Research Data’, LawArXiv, 2018, , https://doi.org/10.31228/osf.io/n7gfa.))

Free but not Gratis

Making research data open needs time and effort. Thus, open research data are “free”, but not gratis.((Frederika Welle Donker, ‘Funding Open Data’, in: Open Data Exposed, ed. by Bastiaan van Loenen, Glenn Vancauwenberghe, and Joep Crompvoets, Information Technology and Law Series (2018), pp. 55–78, https://doi.org/10.1007/978-94-6265-261-3_4.)) It is only a small step, but even this cannot be an additional hurdle. Nevertheless, overall this seems to be within limits. The Swiss National Fond Science Foundation calculated with 0.2% of the annual requested budget in the year 2017 for making research data open.((Katrin Milzow and others, Open Research Data: SNSF Monitoring Report 2017-2018 (2020), p. 11, https://doi.org/10.5281/zenodo.3618123.)) By this, please keep in mind that it is possible to apply for research data management in your funding application.

State-driven Initiatives

Open data is one of the overarching terms for open research data. A turn towards openness is not only taking place in the scientific field. For some years now, open data has also been increasingly demanded and promoted in politics and business. One of the reasons for this is that from open data becomes a common resource from which others can extract value.((Rob Inkpen, Ritienne Gauci, and Andy Gibson, ‘The Values of Open Data’, Area, 2020, https://doi.org/10.1111/area.12682.))

A key point here is the EU Parliament and the European Commission with a directive in 2019 on open data and the re-use of public sector information((European Commission and European Parliament, Directive (EU) 2019/1024 of the European Parliament and of the Council of 20 June 2019 on Open Data and the Re-Use of Public Sector Information, 172, 2019, OJ L, http://data.europa.eu/eli/dir/2019/1024/oj/eng.)) Agreements of the G7 provide for something similar, namely “sharing of research data as openly as possible”.((‘G7 Research Compact’, 2021, p. 2, https://www.bmbf.de/bmbf/shareddocs/downloads/files/G7_Research_Compact.pdf?__blob=publicationFile&v=4 [link broken].)) The German government is promoting similar efforts.((See our one of our blogposts and especially Open Data Handbuch, ed. by Bundesverwaltungsamt and Kompetenzzentrum Open Data, 2020, https://www.bva.bund.de/SharedDocs/Downloads/DE/Behoerden/Beratung/Methoden/open_data_handbuch.pdf?__blob=publicationFile&v=8.)) Other countries like France or China show also the same ambitions regarding the use, opening and dissemination of public data.((Premier ministre de la république française, Circulaire N°6264/SG du 27 Avril 2021, Relative à la politique publique de la donnée, des algorithmes et des codes sources, 2021, https://www.legifrance.gouv.fr/circulaire/id/45162 and Lili Zhang and others, ‘A Review of Open Research Data Policies and Practices in China’, Data Science Journal, 20.1 (2021), 3, https://doi.org/10.5334/dsj-2021-003. For other examples and the aspect of data ethics see also ‘Good Practice Principles for Data Ethics in the Public Sector’, ed. by OECD Digital Government and Data Unit, 2020, https://www.oecd.org/gov/digital-government/good-practice-principles-for-data-ethics-in-the-public-sector.pdf.))

A Special Type: Linked Open Data

A major advantage of processing open research data is that it is freely accessible. By this, it makes data readily available and easier to re-use for other researchers. A linked open data format helps to design data more attractive and is easier to on analysis, re-use and integration. Linked open data. Such “linkable” data allows other researcher to enrich it with the help of different resources.((Best Practices for Library Linked Open Data (LOD) Publication, ed. by LIBER Linked Open Data (LOD) Working Group, 2021, p. 3.)) The overarching goal is to weave a so-called network of knowledge, whereby structures, connections and contexts become visible and machine-readable.

Publications from lod-cloud.net, CC BY 4.0

The linked data paradigm puts an emphasis on the structure of the data using triples; each statement is divided into three elements (subject, predicate and object). The description is especially based on RDF (Resource Description Framework) to make the data not only accessible on the internet but also linkable to other scientific representations.((Yannis Charalabidis and others, ‘The Multiple Life Cycles of Open Data Creation and Use’, in: The World of Open Data: Concepts, Methods, Tools and Experiences, ed. by Yannis Charalabidis and others, Public Administration and Information Technology (2018), pp. 11–31 (p. 14), https://doi.org/10.1007/978-3-319-90850-2_2.))

The LOD cloud website shows datasets, which have been published as linked open data. This gives a good impression of the diversity of linked open data. A prominent examples of linked open data is Wikidata, which offers a low-barrier and but high-quality method for making data not only visible but also reusable.((See for example Stacy Allison-Cassin and Dan Scott, ‘Wikidata: A Platform for Your Library’s Linked Open Data’, The Code4Lib Journal, 40, 2018, https://journal.code4lib.org/articles/13424.)) Other well-known examples are standards files such as VIAF or GND in the library sector.

Why can it be relevant for MPG Researcher?

ORD and linked open data can both be relevant for MPG researcher. Three main aspects can be seen as positive drivers.

  • First, ORD can produce a better understanding of the research outcome. This significantly increases the reproducibility. Based on the theses and the open data, it is possible to understand how the results were classified and ultimately how answers were given.
  • Second ORD also increases the possibilities for subsequent use. Other scientists can use the data for other research questions.((See for example Xiaoguang Wang, Qingyu Duan, and Mengli Liang, ‘Understanding the Process of Data Reuse: An Extensive Review’, Journal of the Association for Information Science and Technology, (2021), https://doi.org/10.1002/asi.24483.)) The availability of open research data therefore provides both the opportunity and the necessary legal requirements, i.e. data licenses, to continue working with the data.
  • And third, it has been shown that the use of ORD also increases the visibility of research. Working and publishing open research data, i.e. at Edmond, can lead to greater awareness of one’s own research and results.((Heather A. Piwowar and Todd J. Vision, ‘Data Reuse and the Open Data Citation Advantage’, PeerJ, 1 (2013), e175, https://doi.org/10.7717/peerj.175.))

It is also worth to mention, that currently in the field of research software similar discussion are going on.

If you are an MPG scientist and have questions about this topic or need advice, please contact our RDM support.

Latest News relating to Open Research Data