Attention-Guided Context Pruning for Large Language Models

ACL ARR 2026 January Submission8427 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Language Models, Context Pruning, Long Context
Abstract: While large language models (LLMs) demonstrate remarkable capabilities, their effectiveness is hindered by the ever-increasing length of prompts, which introduces information scarcity and substantial computational overhead. Existing prompt pruning methods, such as LLMLingua, lack contextual awareness and offer limited flexibility in controlling compression rates, often resulting in either insufficient pruning or excessive information loss. In this paper, we propose AttentionPrompt, an attention-guided prompt pruning method for LLMs in the RAG setting. The core idea of AttentionPrompt lies in its attention focus mechanism, which reformulates user queries into a next-token prediction paradigm. This mechanism isolates the query's semantic focus to a single token, enabling precise and efficient attention calculation between queries and contexts. Extensive experiments on LongBench and Babilong benchmarks show that AttentionPrompt achieves up to 6.3x context compression while outperforming LLMLingua methods by around 10% in key metrics.
Paper Type: Long
Research Area: LLM Efficiency
Research Area Keywords: pruning
Languages Studied: English
Submission Number: 8427
Loading