TCI: Mitigating Hallucination in LVLMs Via Text Contrastive Intervention

Lunyi Chen; Zongshu Li; Guibo Zhu

TCI: Mitigating Hallucination in LVLMs Via Text Contrastive Intervention

Lunyi Chen, Zongshu Li, Guibo Zhu

19 Sept 2025 (modified: 17 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: hallucination, LVLMs

TL;DR: a training-free approach to mitigating hallucination in LVLMs

Abstract: Large Vision-Language Models (LVLMs) have achieved remarkable progress across a wide range of tasks by integrating visual and textual information. Yet they still suffer from a common issue: hallucination, where the generated text fails to accurately align with visual inputs. Existing contrastive methods primarily intervene on the visual modality, perturbing images to indirectly amplify language priors, but fail to directly target text to expose and mitigate text bias. To address this, we propose \textbf{T}ext \textbf{C}ontrastive \textbf{I}ntervention (TCI), a training-free approach that amplifies visual information in those attention layers most susceptible to language bias. Our method is inspired by a key observation: the \textit{repetition phenomenon}, where LVLMs tend to verbatim repeat text when conflicts arise between the images and accompanying text. We hypothesize this behavior stems from language priors—a critical cause of hallucinations. TCI operates in two steps: first quantifying per‑layer attention shifts under text perturbation to identify the layers where visual attention is most compromised; then we selectively boost the corresponding visual‑attention weights during generation, steering the model away from text bias. Extensive experiments demonstrate that TCI significantly reduces hallucinations while requiring only a small amount of data, demonstrating its effectiveness and efficiency.

Supplementary Material: zip

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 18745

Loading