Keywords: Prompt Compression, Retrieval Augmented Generation, Efficient Method
Abstract: Prompt compression is aimed at reducing input prompt lengths to enable cheaper and faster LLM predictions. However, existing prompt compression methods are often limited by modest compression gains, a risk of hallucination, and/or high compression latency. This paper proposes EffComp, an $\textbf{Eff}$icient prompt $\textbf{Comp}$ression framework using a hybrid reinforcement and supervised learning approach for RAG-based open-domain question answering (QA). EffComp employs BERT-style document reranker and sentence selector models to allow fast extractive prompt compression at the sentence level. Its extractive nature prevents hallucinations in the compressed prompts. Additionally, the training process is designed to optimize the compression ratio while preserving LLM accuracy. Experiments on four open-domain QA datasets demonstrate that EffComp outperforms state-of-the-art prompt compression methods in terms of prediction accuracy and achieves competitive compression ratios (up to 78.4x) with minimal latency, making it practical for real-world applications.
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 19378
Loading