- Keywords: Named Entity Recognition, Vernacular Plant Name Recognition, Scientific Plant Name Recognition, Botany
- TL;DR: Neural entity recognition for scientific and vernacular plant names applied to multiple German and English text genres.
- Abstract: The identification of taxonomic entities plays a decisive role when it comes to natural language understanding and automated knowledge extraction in botanical contexts. In this paper, we present a semi-supervised approach for scientific and vernacular plant name recognition across different text genres for German and English. Our pipeline includes linguistic preprocessing and dictionary-based annotation using gazetteers for multiple scientific and vernacular entity labels. We train a state-of-the-art neural NER system on various datasets exploiting token-level and character-level contextual features of natural language. Finally, an evaluation of the entity tagger showed F1-scores >80% on a manually annotated test set and >90% on the automatic annotations, for both languages. We discuss the insights gained from adopting several dataset and language-specific parameter combinations in single and cross-dataset evaluation settings. Our approach emphasizes the potential of domain-specific entity labels and low-effort data models trained on automatically annotated material to explore and computationally process lower-resourced fields and genres.
- Archival status: Archival
- Subject areas: Machine Learning, Natural Language Processing, Information Extraction