Using Git for Research Data Management: Opportunities within the Max Planck Society

Very Short Introduction into Git

Git is a free software for distributed version management of files. It is a software solution and today the de facto standard for source code version management. Git is a distributed system where each user can work on his files locally on his computer and create new versions and branches from them. Normally, one works on new features in the branches. The user later merge these into the “master” branch. Each commit of one or more changes receives a hash value to ensure cryptographic security. The famous systems are GitHub (run by Microsoft) and GitLab (an Open Source Project). Both have a free basic offering and offer paid plans with additional functionalities and support.

How to Use Git for Research Data Management

Git systems enable versioned storage of files and and are ideal for software development and suit also well for collaborative use of research data. During a research process, different versions of files often accumulate over time. These should be documented in a comprehensible way according to the FAIR principles. Git can support this in many cases. For learning how to use and version control, the Git lessons by the Software Carpentries are highly recommended.

Sustainable Data with Git

With Git, there are also different ways to apply the FAIR principles. For example, the DFG Project “Conquaire” the University of Bielefeld is offering a quality control for research data to ensure reproducibility.

Public data from github.com and gitlab.com are automatically uploaded to www.archive.softwareheritage.org. You can rely on this kind of passive archiving. However, for more sustainable documentation together with a DOI, we strongly recommend storing it in a repository.

Git Systems within the MPG

Three central Git systems exist for the Max Planck Society. First, the Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen is operating a GitLab system: https://gitlab.gwdg.de. A short description can be found here. Second, the Max Planck Computing and Data Facility is offering a GitLab instance for Max Planck Researcher: https://gitlab.mpcdf.mpg.de. And third, the Max Planck Institute for Molecular Genetics runs a GitHub system, which is open to the whole Max Planck Society: https://github.molgen.mpg.de. As MPG research you can get an account via their helpdesk (helpdesk [at] molgen [dot] mpg [dot] de).

The German Climate Computing Centre is also offering a Git service with https://gitlab.dkrz.de. However, this is limited to MPG researcher, which are part of a DKRZ project.

Some Max Planck Institutes have also their own Git systems for their researcher. Examples could be https://git.mpib-berlin.mpg.de, or https://gitlab.pks.mpg.de. In any cases, ask your local IT support at your institute, they can always help you.

GitHub and GitLab are centrally licenced for the Max Planck Society by the Software Licensing Group of the MPDL. If you have a question about this, please ask your local software licensing officer at your institute.

Be Updated

There are many different ways for scientists to keep up to date with Git systems. In addition to direct exchange with other researchers, there is also a general DFN mailing list, for example, which can be helpful for this purpose. This overview of academic libraries on GitHub can also be helpful.

If you need support on how to use Git system or else, please let us know via our RDM support service.

And if you still have not had enough of Git, then the game OhMyGit might be just the thing for you.

Further Reading

Ayer, V., Herrmann, F., Peil, V., Pietsch, C., Rempel, A., Schirrwagen, J., Vompras, J., & Wiljes, C. (2019): Automatische Qualitätskontrolle von Forschungsdaten durch kontinuierliche Integration mit GitLab CI, https://pub.uni-bielefeld.de/record/2939350.

Blischak, J. D., Davenport, E. R., & Wilson, G. (2016): A Quick Introduction to Version Control with Git and GitHub. PLOS Computational Biology, 12(1), e1004668, doi:10.1371/journal.pcbi.1004668.

Bryan, J. (2017): Excuse me, do you have a moment to talk about version control? PeerJ, e3159v2, doi:10.7287/peerj.preprints.3159v2.

Eaton, M. (2018): A Comparative Analysis of the Use of GitHub by Librarians and Non-Librarians. Publications and Research, https://academicworks.cuny.edu/kb_pubs/134.

Perez-Riverol, Y. et al (2016) : Ten Simple Rules for Taking Advantage of Git and GitHub, PLOS Computational Biology, 12(7), e1004947, doi:10.1371/journal.pcbi.1004947.

Ram, K. (2013): Git can facilitate greater reproducibility and increased transparency in science. Source Code for Biology and Medicine, 8(1), 7, doi:10.1186/1751-0473-8-7.