Compress and Mix: Advancing Efficient Taxonomy Completion with Large Language Models

Hongyuan Xu; Yuhang Niu; Yanlong Wen; Xiaojie Yuan

Compress and Mix: Advancing Efficient Taxonomy Completion with Large Language Models

Hongyuan Xu, Yuhang Niu, Yanlong Wen, Xiaojie Yuan

Published: 29 Jan 2025, Last Modified: 29 Jan 2025WWW 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Track: Semantics and knowledge

Keywords: Taxonomy Completion, LLM, Context Compression, Mixup

Abstract: Taxonomy completion aims to integrate new concepts into existing taxonomies by determining their appropriate hypernym and hyponym. While semantic and structural information are crucial for this task, existing approaches often struggle to balance these aspects effectively. In this paper, we propose **COMI**, an efficient taxonomy completion framework that leverages large language models (LLMs) to capture both semantic and structural information in a unified manner. COMI **co**mpresses node semantics into token representations, enabling LLMs to efficiently process the input structure composed of these tokens. To enhance the model's understanding of the structure, a further fine-tuning process using contrastive learning with **mi**xup data augmentation is applied, where mixup generates diverse and challenging negative samples. Through these innovations, COMI improves the integration of semantic and structural information, leading to more accurate taxonomy completion. The experimental results on three real-world datasets demonstrate that COMI achieves state-of-the-art performance while showing up to 284$\times$ faster inference compared to the previous best method. Our code and compressed tokens will be available for further study upon publication.

Submission Number: 2051

Loading