Memory Augmentation Unlocks Efficient Chain-of-Thought Reasoning

ACL ARR 2026 January Submission8632 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Chain-of-Thought (CoT), Reasoning Models, Memory-Augmented Compression, Inference Acceleration
Abstract: Reasoning models achieve remarkable performance through Chain-of-Thought (CoT), yet the verbose reasoning process introduces significant inference latency and computational overhead. CoT compression aims to accelerate inference; however, naive compression approaches inevitably disrupt the coherence of reasoning logic, leading to severe performance collapse. To address this trade-off, we leverage the Context-Generation Substitution Law: shifting the computational burden from expensive serial generation to efficient parallel context processing. Guided by this insight, we propose Memory-Augmented Compression, a generalizable paradigm that utilizes an explicit memory of abstracted reasoning patterns as a cognitive scaffold. By injecting high-density reasoning patterns into the context, we incur only a marginal prefill cost to bypass redundant reasoning steps, achieving massive output compression. Extensive experiments demonstrate the superiority of this paradigm. Remarkably, as a training-free, plug-and-play solution, our method outperforms a fine-tuned baseline by over 22 percentage points on GSM8K. On the challenging MATH-500 benchmark, we achieve robust performance, surpassing the uncompressed Standard CoT by nearly 10%. Comprehensive evaluations validate that our approach effectively establishes a new efficiency frontier for reasoning models.
Paper Type: Long
Research Area: LLM Efficiency
Research Area Keywords: LLM Efficiency,Mathematical Reasoning,Language Modeling
Contribution Types: Approaches low compute settings-efficiency
Languages Studied: English
Submission Number: 8632
Loading