Beyond Hard and Soft: Hybrid Context Compression for Balancing Local and Global Information Retention

03 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Context Compression
TL;DR: This paper introduces HyCo2, a hybrid context compression method for Large Language Models that balances retaining both crucial local details and overall global semantics to improve long-text understanding and efficiency.
Abstract: Large Language Models (LLMs) encounter significant challenges in long-sequence inference due to computational inefficiency and redundant processing, driving interest in context compression techniques. Existing methods often rely on token importance to perform hard local compression or encode context into latent representations for soft global compression. However, the former struggles to retain global information, while the latter struggles to maintain local details. To address this, we propose Hybrid Context Compression (HyCo2) for LLMs, which integrates both global and local perspectives to guide context compression while retaining both the essential semantics and critical details for task completion. Specifically, we employ a hybrid adapter to refine global semantics with the global view, based on the observation that different adapters excel at different tasks. Then we incorporate a classification layer that assigns a retention probability to each context token based on the local view, determining whether it should be retained or discarded. To foster a balanced integration of global and local compression, we introduce auxiliary paraphrasing and completion pretraining before instruction tuning. This promotes a synergistic integration that emphasizes instruction-relevant information while preserving essential local details, ultimately balancing local and global information retention in context compression. Experiments show that our HyCo2 method significantly enhances long-text reasoning while reducing token usage. It improves the performance of various LLM series by an average of 13.1% across seven knowledge-intensive QA benchmarks. Moreover, HyCo2 matches the performance of uncompressed methods while reducing token consumption by 88.8%. Our code will be available at \url{https://anonymous.4open.science/r/HyCo2}.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 1221
Loading