Abstract: Retrieval Augmentation Generation (RAG) has significantly mitigated hallucination issues in Large Language Models (LLMs), with context compressing playing a pivotal role in enhancing the efficiency of the RAG systems. Traditional context compressing approaches include extractive and abstractive methods. Extractive methods often perform poorly due to their independent modeling of sentences, while abstractive methods suffer from high latency and the risk of introducing hallucinations. In this paper, we propose GCR, a novel generative compression method that reformulates context compression as sentence index generation, ensuring minimal inference latency. GCR effectively models semantic interactions between sentences, prevents potential hallucinations during compression, and offers adaptive control over the compression rate. Extensive experiments across three knowledge-intensive tasks confirm the effectiveness and efficiency of our method.
Paper Type: Long
Research Area: Language Modeling
Research Area Keywords: Context Compression, Augmented Generation
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 370
Loading