LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression

LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression

ICLR 2024 Workshop ME-FoMo Submission6 Authors

Published: 04 Mar 2024, Last Modified: 30 Apr 2024ME-FoMo 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Prompt Compression, Long Context, LLMs, Black-box LLMs, Efficient Method

Abstract: In long context scenarios, large language models (LLMs) face three main challenges: higher computational cost, performance reduction, and position bias. Research indicates that LLM performance hinges on the density and position of key information in the input prompt. Addressing this, we introduce LongLLMLingua, a method for prompt compression that improves LLMs’ key information perception, effectively tackling these challenges. Our extensive evaluation across various long context scenarios demonstrates that LongLLMLingua not only enhances performance but also significantly reduces costs and latency. For instance, in the NaturalQuestions benchmark, LongLLMLingua boosts performance by up to 21.4% with around 4x fewer tokens in GPT-3.5-Turbo, leading to substantial cost savings. It achieves a 94.0% cost reduction in the LooGLE benchmark. Moreover, when compressing prompts of about 10k tokens at rates of 2x-10x, LongLLMLingua can accelerate end-to-end latency by 1.4x-2.6x.

Submission Number: 6

Loading