HAPPI: Efficient KV cache compression with Hadamard PCA-based Power iteration

HAPPI: Efficient KV cache compression with Hadamard PCA-based Power iteration

ICLR 2026 Conference Submission16288 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Low-rank approximation, KV Cache Compression, LLM Inference Optimization

Abstract: Truncated Singular Value Decomposition (SVD) has recently attracted renewed attention for its effectiveness in model optimizations, such as LoRA initialization and KV-cache compression. However, exact SVD remains computationally expensive, while approximate methods like power iteration often introduce non-negligible errors. In this paper, we present Hadamard PCA-based Power Iteration (HaPPI), a new algorithm that significantly improves the accuracy of low-rank approximation while retaining efficiency. Compared to prior methods, HaPPI achieves the lowest approximation error at a practical computational cost. Building on this foundation, we further propose HaPPI-KV, which combines HaPPI with key whitening and residual quantization to deliver high compression ratios for key–value caches. By leveraging both the efficiency and precision of HaPPI, HaPPI-KV achieves state-of-the-art trade-offs between memory efficiency and model quality, highlighting the superiority of our approach.

Supplementary Material: zip

Primary Area: other topics in machine learning (i.e., none of the above)

Submission Number: 16288

Loading