A Unified Taxonomy-Guided Instruction Tuning Framework for Entity Set Expansion and Taxonomy Expansion

Yanzhen Shen; Yu Zhang; Yunyi Zhang; Jiawei Han

A Unified Taxonomy-Guided Instruction Tuning Framework for Entity Set Expansion and Taxonomy Expansion

Yanzhen Shen, Yu Zhang, Yunyi Zhang, Jiawei Han

Published: 20 Dec 2024, Last Modified: 31 Dec 2024AI4Research @ AAAI 2025 OralEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Information systems, Data mining, Taxonomy Construction, Instruction Tuning

TL;DR: A unified taxonomy-guided instruction tuning framework that leverages large language models to address entity set expansion, taxonomy expansion, and seed-guided taxonomy construction by jointly teaching the skills of finding "siblings" and "parents,"

Abstract: Scientific taxonomy plays a crucial role in organizing and structuring scientific knowledge across various fields like Medical Science and Computer Science. With the rapid advancement of scientific research and the emergence of new scientific concepts, people have also sought to automatically populate an existing taxonomy. Entity set expansion, taxonomy expansion, and seed-guided taxonomy construction are three representative tasks that can be applied to automatic taxonomy construction. Previous studies view them as three separate tasks. Therefore, their proposed techniques usually work for one specific task only, lacking generalizability and a holistic perspective. In this paper, we aim at a unified solution to the three tasks. To be specific, we identify two common skills needed for entity set expansion, taxonomy expansion, and seed-guided taxonomy construction: finding siblings and finding parents. We propose a taxonomy-guided instruction tuning framework to teach a large language model to generate siblings and parents for query entities, where the joint pre-training process facilitates the mutual enhancement of the two skills. Extensive experiments on multiple benchmark datasets demonstrate the efficacy of our proposed TaxoInstruct framework, which outperforms task-specific baselines across all three tasks.

Archival Option: The authors of this submission do *not* want it to appear in the archival proceedings.

Submission Number: 30

Loading