Abstract: Large Language Models (LLMs) have demonstrated remarkable general intelligence
but still struggle with hallucination problems. Retrieval Augmented
Generation (RAG) addresses it by incorporating external knowledge sources. However,
a critical challenge in RAG systems is the misalignment between embedding-based
retriever and LLM generator. This paper introduces a novel approach to align
the embedding model with LLM through Citation Enhanced Generation (CEG). Our
method leverages citation information from LLM outputs to create positive
and negative training samples for embedding model fine-tuning. This method incorporates
LLM feedback into embedding model training, thereby achieving alignment
between them. Experimental results demonstrate significant improvements in RAG
performance across multiple datasets, with particularly notable gains in
specialized domains.
Paper Type: Short
Research Area: Information Retrieval and Text Mining
Research Area Keywords: Information Retrieval and Text Mining
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 2651
Loading