LLM-powered Context Augmentation for Heterogeneous Citation Networks

ACL ARR 2024 June Submission4138 Authors

16 Jun 2024 (modified: 02 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Recent advances in large language models (LLMs) such as ChatGPT and Llama have driven significant progress in natural language processing and diverse AI applications. In this paper, we explore how LLMs can enhance the construction of heterogeneous citation networks by integrating rich contextual information derived from LLMs. We propose a novel approach that augments content-based feature engineering with context-aware techniques. Specifically, we queried the contents within the metadata using Llama3 to extract context, encoded this knowledge-rich context using the LLM encoder DeBERTa, and constructed a knowledge-rich heterogeneous citation network. Experimental results demonstrate that our LLM-powered context augmentation improves author classification by 2% to 24% and author clustering by 6% to 33%, compared with existing feature engineering approaches. The dataset and source code are available at https://anonymous.4open.science/r/LLM-citation-252F/.
Paper Type: Short
Research Area: Machine Learning for NLP
Research Area Keywords: graph-based methods, knowledge-augmented methods, knowledge base construction, representation learning, multimodal applications
Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models, Data resources
Languages Studied: English
Submission Number: 4138
Loading