Head-Level Mechanistic Attribution for Hallucination Control: Training-Free Counteractive Pruning in LVLMs

WuLiyan; Fan Wu; Meng Wang; Ruoyu Li; Lanfeng Wu; Yujie XUE

Head-Level Mechanistic Attribution for Hallucination Control: Training-Free Counteractive Pruning in LVLMs

WuLiyan, Fan Wu, Meng Wang, Ruoyu Li, Lanfeng Wu, Yujie XUE

16 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Vision-Language Models;Object Hallucination;Attention Head Attribution;Dynamic Pruning;InfoSpectralScore

TL;DR: We introduce a fine-grained attribution and pruning method for vision-language models that substantially reduces object hallucinations while preserving caption informativeness, without additional training.

Abstract: Large vision-language models (LVLMs) excel at multimodal tasks but often generate instance-level object hallucinations, describing nonexistent objects. Since existing methods overlook functional conflicts within attention heads and lack principled, fine-grained attribution and intervention at the head level, hallucination suppression is often accompanied by a substantial loss of semantic informativeness. To overcome these limitations, we propose HACP, a unified framework that enables fine-grained internal hallucination control via precise intervention at the attention head level. Specifically, we introduce InfoSpectralScore, a novel attribution metric based on eigen-decomposition with spectral variance and entropy penalties, which allows for the accurate identification of hallucination-inducing heads. We further develop a dynamic, training-free pruning strategy that adaptively suppresses hallucination-prone heads while reinforcing faithful heads during inference. Extensive experiments across multiple LVLMs and benchmarks demonstrate that HACP achieves state-of-the-art hallucination mitigation, substantially reducing hallucinations while better preserving caption informativeness compared to existing approaches, thus offering a robust and transferable solution for controllable and interpretable multimodal generation. The source code will be released upon acceptance.

Supplementary Material: zip

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 7813

Loading