Dismantling Pathological Shortcuts: A Causal Framework for Faithful LVLM Decoding

Liu Yu; Can Chen; PING KUANG; Zhikun Feng; Fan Zhou; Gillian Dobbie

Dismantling Pathological Shortcuts: A Causal Framework for Faithful LVLM Decoding

Liu Yu, Can Chen, PING KUANG, Zhikun Feng, Fan Zhou, Gillian Dobbie

Published: 30 Apr 2026, Last Modified: 24 Jun 2026ICML 2026 regularEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Large Vision-Language Models (LVLMs) exhibit sophisticated reasoning but remain susceptible to object hallucination. Deviating from the prevailing *attention intensity assumption*, we reveal a deeper dynamic structural misalignment: hallucination is triggered at decision-critical steps where specific attention heads, acting as risky mediators, decouple from visual evidence to lock onto language priors. This establishes a pathological shortcut that bypasses visual grounding. To dismantle this, we propose **Fox** (*F*aithfulness and *O*bservational-flow via e*X*pression-rectification), a training-free inference-time framework. **Fox** diagnoses structural misalignment using a visual attention entropy probe to localize risky mediators unsupervisedly. We then execute a targeted causal intervention via numerical logit saturation to physically sever the shortcut path. Finally, a conflict-gated cooperative decoding strategy reconciles interventional faithfulness with observational fluency. Extensive experiments demonstrate that **Fox** achieves SOTA performance, outperforming SID by $29.1\%$ while preserving linguistic richness. Code is available at <https://github.com/Cc2021start/Fox>.

Lay Summary: AI systems that answer questions about images can sometimes describe things that are not actually there, such as inventing objects or attributes. This makes them harder to trust in real-world uses where visual accuracy matters. Our work studies why these mistakes happen and finds that the model is not simply “looking too little” at the image. Instead, some parts of the model can rely too strongly on learned language habits, especially when deciding what to say next. We propose Fox, a method that detects these risky parts during generation and reduces their influence, without retraining the model. The method also keeps the model’s ability to produce detailed and natural answers, so it does not become overly cautious. In tests across several image-language AI systems, Fox reduces false visual claims while keeping answers useful and fluent, with little extra cost. This work offers a practical step toward more reliable AI systems that can describe and reason about visual content more faithfully.

Primary Area: Social Aspects->Safety

Keywords: Hallucination mitigation, LVLMs, causal mechanism

Originally Submitted PDF: pdf

Submission Number: 3458

Loading