HalluTrace: Causal Attribution and Source-Targeted Decoding for Hallucination in Large Vision-Language Models

Published: 05 May 2026, Last Modified: 12 May 20264th ALVR PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: hallucination attribution, vision-language models, causal decoding
TL;DR: HALLUTRACE attributes LVLM object hallucination to visual grounding failure, language prior dominance, or cross-modal conflict via causal ablations; HAD routes targeted decoding interventions accordingly, reducing CHAIRI by 3.7–5.6 points.
Abstract: Object hallucination in large vision-language models (LVLMs) is well-documented, but the mechanisms that produce it remain poorly understood. We introduce HALLUTRACE, a causal attribution framework that decomposes hallucination into three distinct sources: (VGF) visual grounding failure, where the visual encoder produces a representation insufficient to identify the target object; (LPD) language prior dominance, where the language model overrides a correct visual signal with a statistically-driven prediction; and (CMC) cross-modal conflict, where visual and linguistic signals are irreconcilably inconsistent and the model resolves the conflict incorrectly. We operationalise these sources via causal component ablations: intervening on fvis, fproj, and fLM independently and measuring the change in CHAIR score. Experiments on five LVLMs show that attribution patterns are object-category-specific and model-consistent: person/vehicle hallucinations are predominantly LPD (≥52%), food/furniture hallucinations are predominantly VGF (≥44%), and animal hallucinations split between VGF and CMC. Guided by these attributions, we design HAD (Hallucination-Aware Decoding), a unified decoding strategy that applies source-targeted interventions: visual signal amplification for VGF, language prior suppression for LPD, and contrastive re-weighting for CMC. HAD reduces CHAIRI by 3.7–5.6 points and improves POPE F1 by 1.9–3.1 points over LLaVA-1.5, outperforming VCD and ICD on all three benchmarks (CHAIR, POPE, MME) without any additional training. We further prove that the attribution-decoding correspondence is tight: the CHAIR improvement from HAD is linearly predictable from the VGF attribution share (r = 0.86, p < 10−6), validating the causal framework.
Submission Number: 48
Loading