[Novel] WikiMed-DE: Constructing a Silver-Standard Dataset for German Biomedical Entity Linking using Wikipedia and Wikidata

Published: 29 Aug 2023, Last Modified: 09 Oct 2023ISWC 2023 Workshop Wikidata Submission
TL;DR: A large, automatically annotated dataset for German biomedical entity linking.
Abstract: This paper introduces WikiMed-DE, a silver-standard, automatically annotated biomedical entity linking dataset for the German language. WikiMed-DE encompasses a substantial collection of 53,981 articles from the German Wikipedia annotated with 1,951,081 mentions corresponding to 317,010 unique mention URLs. The hyperlinks of Wikipedia articles are used to connect concept mentions to Wikidata and transitively to three biomedical concept IDs: the Concept Unique Identifier from the Unified Medical Language System, the MeSH ID from Medical Subject Headings hierarchy, and the DOID from the Disease Ontology. A curated subset, WikiMed-DE-BEL, is released as a ready-to-use benchmark for biomedical entity linking in German. It features the same number of articles as WikiMed-DE, but only the highest-quality information is retained: 413,913 mentions corresponding to 35,012 unique concepts. Both resources are available at: https://doi.org/10.5281/zenodo.8188966.
