Published research data have limited potential for re-use if re-use is not clearly defined. It is, therefore, crucial to assign a license to a dataset. Community standards and good scientific practice should still be observed even if not enforced by licenses. All following licenses allow open re-use of data, subject to certain conditions.

Creative Commons

Creative Commons licenses were initially designed for the licensing cultural material, but the latest version 4.0 is also suitable for scientific data. While not specifically addressing characteristics of (scientific) data, they are widely known and adopted. Hence users will be familiar with their meaning. All licenses come with a summary (in many languages) understandable without legal background and the full license text.

Recommended licenses are CC BY (Attribution) or CC0 (Public Domain, No Rights Reserved). CC BY obliges re-users legally to give credit to data producers (without implying endorsement). CC0 does not include such a formal obligation, but good scientific practice still requires giving due credit. There are some re-use scenarios (as compiling databases from various different sources) where the formal attribution requirement of CC-BY might become practical a hurdle for re-use. You might want to choose CC0, if enabling re-use is your primary concern.

Other conditions can be imposed but are not recommended for various reasons: SA (ShareAlike) imposes that works derived from data are distributed under the same license (“copyleft”). This can pose problems if data from various sources provided with different licenses are aggregated. NC (NonCommercial) prohibits commercial use, but it can be challenging to define what exactly represents commercial (re-)use. The non-commercial restriction may unintentionally hinder re-use by scientists: Small commercial side-activities (e.g. patent royalties, income from events, sponsoring or funding with commercial interest) of a scientist or his organization can lead to a classification as not non-commercial and so trigger the NC-license restriction and prevent even scientific re-use. ND (NoDerivatives) allows re-users to distribute the data only in the original form, without any adaptions. While the latter could make sense in scenarios e.g. with personal data, testimonies or certificates, it mostly narrows down dramatically the possibilities for re-use.

Copyright law does not cover data in general. Depending on national legislation, some data and data schemes (especially such being the result of a significant creative process by humans) may by covered by copyright. While other data (especially such being the result of a simple measurement process) may not be covered. In addition certain re-use scenarios (cf. fair use in the US or personal scientific use or formal citation in Germany) may be freely allowed, even if the data or the data schemes in question are in principal protected by copyright. Therefore, legal restrictions of CC-Licenses may fail to provide the intended re-use obligations or restrictions on some data objects and in some re-use scenarios.

Open Data Commons

Open Data Commons licenses were explicitly designed for data. One key feature is that different licenses can be applied to a database as a whole and its content – this is relevant, for example, if an image database compiles images from various sources with some images copyrighted. These licenses also have human-readable summaries as well as full legal text.

Analogous to the CC licenses mentioned above, are ODC-By (Open Data Commons Attribution License) and PDDL (Public Domain Dedication and License). Open Data Commons licenses are better suited to tag data but not as widely used as CC licenses.

Other Licenses

Datenlizenz Deutschland was created for administrative government data. While being exceedingly short and precise, consistent with international open data standards, and having license text in German and English, an international audience may not be familiar with these licenses.

Further Reading

