Automated Synonym Discovery for Taxonomy Maintenance Using Semantic Search Techniques

Published: 01 Jan 2024, Last Modified: 24 Feb 2025NLDB (2) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Taxonomies group synonymous terms together into concepts, arranged into hierarchical “broader than” semantic relations. However, creating and maintaining taxonomies is labour-intensive, especially when they reach a scale of hundreds of thousands or millions of terms. Here, we present an automated solution to support taxonomy editors in identifying synonymous terms in scientific literature, by leveraging semantic search techniques. Our method first encodes all taxonomy terms or phrases using a pre-trained BERT-based model. Subsequently, we employ FAISS vector search to efficiently discover synonyms for each term. We evaluate by comparing the terms considered synonymous by our method to a manually curated taxonomy that consists of more than 770,000 terms. By integrating state-of-the-art NLP and search methodologies, our approach offers a practical and efficient solution, that can achieve up to 0.79 precision and 0.25 recall for synonym discovery. This automation scales to large taxonomies and can be used at runtime in large taxonomy-driven document retrieval systems.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview