Bridging the Memorization-Utilization Gap: Near-Lossless Context Compression via Reinforcement Learning

ACL ARR 2026 January Submission7797 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: context compression, reinforcement learning, long-context modeling, representation learning, generalization, question answering
Abstract: Despite recent progress in context compression, we identify a fundamental memorization-utilization gap where models can compress context with near-perfect fidelity yet fail to effectively utilize these compressed representations for downstream tasks. We address this with a holistic training paradigm spanning pretraining, instruction tuning, and reinforcement learning, built upon an average pooling compression. Our key innovation uses outcome-based RL to enable implicit expansion: the model learns to adaptively unfold task-relevant details during generation, interleaving reconstruction with reasoning. We achieve near-lossless 16$\times$ compression across 7B and 32B models, recovering over 98\% of full-context QA performance and outperforming prior methods by 11 points. Our 32B model demonstrates strong out-of-distribution and length generalization, robustly scaling to 120k-token contexts despite training on no more than 4k tokens, matching full-context performance on NIAH, LongBench v2, and multi-hop reasoning.
Paper Type: Long
Research Area: Machine Learning for NLP
Research Area Keywords: representation learning, generalization, reinforcement learning
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Approaches low compute settings-efficiency
Languages Studied: English
Submission Number: 7797
Loading