R-KV: Redundancy-aware KV Cache Compression for Reasoning Models

Zefan Cai; Wen Xiao; Hanshi Sun; Cheng Luo; Yikai Zhang; Ke Wan; Yucheng Li; Yeyang Zhou; Li-Wen Chang; Jiuxiang Gu; Zhen Dong; Anima Anandkumar; Abedelkadir Asi; Junjie Hu

R-KV: Redundancy-aware KV Cache Compression for Reasoning Models

Zefan Cai, Wen Xiao, Hanshi Sun, Cheng Luo, Yikai Zhang, Ke Wan, Yucheng Li, Yeyang Zhou, Li-Wen Chang, Jiuxiang Gu, Zhen Dong, Anima Anandkumar, Abedelkadir Asi, Junjie Hu

Published: 28 Apr 2026, Last Modified: 28 Apr 2026MSLD 2026 PosterEveryoneRevisionsCC BY 4.0

Keywords: Efficient Reasoning, KV Cache

Abstract: Reasoning models have demonstrated impressive performance in self-reflection and chain-of-thought reasoning. However, they often produce excessively long outputs, leading to prohibitively large key-value (KV) caches during inference. While chain-of-thought inference significantly improves performance on complex reasoning tasks, it can also lead to reasoning failures when deployed with existing KV cache compression approaches. To address this, we propose \textbf{R}edundancy-aware \textbf{KV} Cache Compression for \textbf{R}easoning models (\textbf{\method}), a novel method specifically targeting redundant tokens in reasoning models. Our method preserves nearly 100\% of the full KV cache performance using only 10\% of the KV cache, substantially outperforming existing KV cache baselines, which reaches only 60\% of the performance. Remarkably, \method even achieves 105\% of full KV cache performance with 16\% of the KV cache. This KV-cache reduction also leads to a 90\% memory saving and a 6.6$\times$ throughput over standard chain-of-thought reasoning inference. Experimental results show that \method consistently outperforms existing KV cache compression baselines across two mathematical reasoning datasets.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 131

Loading