Anchoring the Cache: Mitigating Contextual Hallucination in KV-Compressed Long-Context Summarization
Keywords: Long-context Summarization, KV cache compression, Hallucination
Abstract: Key-Value (KV) cache compression techniques have improved the efficiency of long-context summarization in Large Language Models (LLMs), but their impact on model hallucination remains underexplored. In this paper, we present the first systematic study of how KV cache compression affects hallucination in long-context summarization, demonstrating that aggressive compression can increase hallucination scores by up to 3.36× compared to the baseline. To mitigate this issue, we propose HalluKV, a decoding-phase strategy that selectively removes generated KV pairs from retrieval heads responsible for retrieving critical information from source context, thereby anchoring their attention on the preserved source information. Our approach maintains computational efficiency while significantly reducing hallucination across multiple models and datasets, achieving up to 5.48 average point reductions on Llama-3-8B-Instruct, enabling more trustworthy long-context summarization.
Paper Type: Long
Research Area: Summarization
Research Area Keywords: Summarization,Efficient/Low-Resource Methods for NLP,Interpretability and Analysis of Models for NLP
Contribution Types: NLP engineering experiment, Approaches to low-resource settings, Approaches low compute settings-efficiency
Languages Studied: English
Submission Number: 3950
Loading