Abstract: Taxonomies provide structural representations of knowledge and are crucial in various applications. The task of taxonomy expansion involves integrating emerging entities into existing taxonomies by identifying appropriate parent entities for these new query entities. Previous methods rely on self-supervised techniques that generate annotation data from existing taxonomies but are less effective with small taxonomies (fewer than 100 entities). In this work, we introduce CodeTaxo, a novel approach that leverages large language models through code language prompts to capture the taxonomic structure. Extensive experiments on five real-world benchmarks from different domains demonstrate that CodeTaxo consistently achieves superior performance across all evaluation metrics, significantly outperforming previous state-of-the-art methods. The code and data are available at https://anonymous.4open.science/r/CodeTaxo4Review-47DB.
Paper Type: Long
Research Area: Information Extraction
Research Area Keywords: knowledge base construction, zero/few-shot extraction
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 1515
Loading