Anchoring the Cache: Mitigating Contextual Hallucination in KV-Compressed Long-Context Summarization

Anchoring the Cache: Mitigating Contextual Hallucination in KV-Compressed Long-Context Summarization

ACL ARR 2026 January Submission3950 Authors

04 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Long-context Summarization, KV cache compression, Hallucination

Abstract: Key-Value (KV) cache compression techniques have improved the efficiency of long-context summarization in Large Language Models (LLMs), but their impact on model hallucination remains underexplored. In this paper, we present the first systematic study of how KV cache compression affects hallucination in long-context summarization, demonstrating that aggressive compression can increase hallucination scores by up to 3.36× compared to the baseline. To mitigate this issue, we propose HalluKV, a decoding-phase strategy that selectively removes generated KV pairs from retrieval heads responsible for retrieving critical information from source context, thereby anchoring their attention on the preserved source information. Our approach maintains computational efficiency while significantly reducing hallucination across multiple models and datasets, achieving up to 5.48 average point reductions on Llama-3-8B-Instruct, enabling more trustworthy long-context summarization.

Paper Type: Long

Research Area: Summarization

Research Area Keywords: Summarization,Efficient/Low-Resource Methods for NLP,Interpretability and Analysis of Models for NLP

Contribution Types: NLP engineering experiment, Approaches to low-resource settings, Approaches low compute settings-efficiency

Languages Studied: English

Submission Number: 3950

Loading