Head-Level Mechanistic Attribution for Hallucination Control: Training-Free Counteractive Pruning in LVLMs

16 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Vision-Language Models;Object Hallucination;Attention Head Attribution;Dynamic Pruning;InfoSpectralScore
TL;DR: We introduce a fine-grained attribution and pruning method for vision-language models that substantially reduces object hallucinations while preserving caption informativeness, without additional training.
Abstract: Large vision-language models (LVLMs) excel at multimodal tasks but often generate instance-level object hallucinations, describing nonexistent objects. Since existing methods overlook functional conflicts within attention heads and lack principled, fine-grained attribution and intervention at the head level, hallucination suppression is often accompanied by a substantial loss of semantic informativeness. To overcome these limitations, we propose HACP, a unified framework that enables fine-grained internal hallucination control via precise intervention at the attention head level. Specifically, we introduce InfoSpectralScore, a novel attribution metric based on eigen-decomposition with spectral variance and entropy penalties, which allows for the accurate identification of hallucination-inducing heads. We further develop a dynamic, training-free pruning strategy that adaptively suppresses hallucination-prone heads while reinforcing faithful heads during inference. Extensive experiments across multiple LVLMs and benchmarks demonstrate that HACP achieves state-of-the-art hallucination mitigation, substantially reducing hallucinations while better preserving caption informativeness compared to existing approaches, thus offering a robust and transferable solution for controllable and interpretable multimodal generation. The source code will be released upon acceptance.
Supplementary Material: zip
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 7813
Loading