Aligning Embedding with LLM by Citation Enhanced Generation

Aligning Embedding with LLM by Citation Enhanced Generation

ACL ARR 2025 May Submission2651 Authors

19 May 2025 (modified: 29 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Large Language Models (LLMs) have demonstrated remarkable general intelligence but still struggle with hallucination problems. Retrieval Augmented Generation (RAG) addresses it by incorporating external knowledge sources. However, a critical challenge in RAG systems is the misalignment between embedding-based retriever and LLM generator. This paper introduces a novel approach to align the embedding model with LLM through Citation Enhanced Generation (CEG). Our method leverages citation information from LLM outputs to create positive and negative training samples for embedding model fine-tuning. This method incorporates LLM feedback into embedding model training, thereby achieving alignment between them. Experimental results demonstrate significant improvements in RAG performance across multiple datasets, with particularly notable gains in specialized domains.

Paper Type: Short

Research Area: Information Retrieval and Text Mining

Research Area Keywords: Information Retrieval and Text Mining

Contribution Types: NLP engineering experiment

Languages Studied: English

Submission Number: 2651

Loading