Opening up and linking type catalogues in Wikidata – increasing the visibility of natural history collections
Confirmation: I have read and agree with the workshop's policy on behalf of myself and my co-authors.
Keywords: collection agents, Linked Open Data, museum catalogues, natural history collections, Open Science, scholarly publications, taxonomy, type catalogues, Wikidata
Abstract: Natural history museums and collections around the world house several billion preserved specimens used by the scientific community to answer questions about the biodiversity and geodiversity on Earth. Collections are increasingly digitised, opened up and made accessible to the wider scientific community and beyond. However, initially only basic information is shared, further related metadata or information on historical contexts is often not connected.
Type specimens are the most important objects in these collections because they are associated with the names of new taxa, serving as a permanent reference. As name-bearing specimens, the type material is regularly examined by scientists as decisive objects for resolving taxonomic issues and clarifying species delimitation. Having easy access to type material is, therefore, an important prerequisite for facilitating research. Type catalogues list present and sometimes lost or missing type material of specific – historical and contemporary – collections, for certain taxonomic groups or research expeditions. Collection catalogues have been published for several centuries and comprise information such as the housing institutions of types and duplicate material, type localities, annotations, and sometimes illustrations.
The open, multilingual and multidisciplinary knowledgebase Wikidata supports discoverability, transparency and accessibility of research data. It is community-curated and serves as a hub for external identifiers and provides structured, human and machine-readable data. Wikidata already comprises data for a huge number of publications including type catalogues, many are accessible via the Biodiversity Heritage Library (BHL) or other digital libraries. However, they are currently not easily searchable as a type of scholarly work.
Data from Wikidata can be reused by other platforms and tools. Well-curated high-quality datasets related to natural history collections require the use of community-agreed standards and persistent identifiers as well as international collaboration. This includes exchange with different organisations promoting standardisation and open access to biodiversity data such as the Global Biodiversity Information Facility (GBIF), the Consortium of European Taxonomic Facilities (CETAF) and Biodiversity Information Standards (TDWG). For example, a TDWG Task Group is developing a terminology on how to model research expeditions in Wikidata, and the BHL-Wiki Working Group is involved in data modelling. Such collaborations – also with the wider Wiki community – help to improve the modelling of type catalogues and other entities in Wikidata and to develop best practice recommendations.
In this study, new Wikidata items are created for articles of type catalogues published in different languages and academic journals, and existing items are enriched by adding different external identifiers (e.g. DOIs, BHL page IDs). In addition, further entities such as the type specimen holding institution(s) or collection agents connected to the material or collection are linked. The project started with type catalogues from the Museum für Naturkunde Berlin, and was then expanded to compile additional catalogues from around the world. The growing open dataset in Wikidata can be (re)used for research in different fields such as taxonomy and systematics, history of collections, digital humanities or provenance research. It highlights the potential of Wikidata for research and knowledge contextualisation.
Format: Paper (20 minutes presentation)
Submission Number: 40
Loading