Domain-level relation extraction for informative taxonomy learning

Published: 2025, Last Modified: 07 Jan 2026Data Min. Knowl. Discov. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Within the ever-shifting domain of technological advancement, the quest for automatic taxonomy construction is intensifying. This paper confronts the nuanced challenges of distilling synonym and hyponym relationships from diverse, domain-specific scientific literature, treating it as a domain-level relation extraction problem and resulting in the creation of a hierarchical taxonomy through arborescence generation. The proposed Multi-Scale Identity Connection (MSIC) model excels in capturing inter-entity relationships across various scales, demonstrating superior empirical performance compared to existing relation extraction models. To enhance practicality, a two-stage optimization is introduced to improve efficiency without compromising performance. Additionally, the Depth-prioritized Maximum Spanning Arborescence (DMSA) algorithm has been proposed as a highly efficient strategy for generating an informative and well-structured taxonomy tree. We annotated a concise dataset to train and validate the MSIC model, subsequently applying it to a substantial domain-specific dataset for taxonomy induction. The experimental findings indicate that the DMSA efficiently constructs an information-rich taxonomy tree structure by leveraging extensive domain-specific scientific literature. These results not only affirm the efficacy of the approach but also highlight its effectiveness in supporting industrial-grade applications.
Loading