Mitigating Gaslighting by Relocating Text-induced Visual Attention Bias

16 Sept 2025 (modified: 20 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: LMMs, Gaslighting, Attention
Abstract: While hallucination in Large Multimodal Models (LMMs) is a well-documented challenge, a more nuanced issue is emerging: LMMs can be misled by plausible but incorrect textual inputs to override factual visual evidence, a phenomenon as known as “gaslighting.” To investigate the underlying mechanism of this vulnerability, we analyze text-to-image attention patterns and uncover a systemic bias that we term Text-Induced Visual Attention Bias (TVAB). We discover that language tokens, irrespective of their semantic content, disproportionately allocate attention to fixed spatial regions of the image. Our findings indicate that this bias originates in the initial layers and is amplified through subsequent layers, ultimately corrupting the model's perception. To address this vulnerability, we propose the Fixed Attention Bias Perception and Redistribution (FAPR) framework. This method efficiently identifies and mitigates the attention bias by reallocating the suppressed attention weight to other text-to-image pathways. Extensive evaluations on a diverse set of benchmarks, including GaslightingBench, PoPE, MMU, AI2Diagram, and MMBench, demonstrate the effectiveness of FAPR. Crucially, our method substantially reduces the model's vulnerability to gaslighting without compromising its core reasoning capabilities on general tasks. This is achieved with a negligible increase in inference latency, demonstrating a practical path toward fostering more trustworthy LMMs.
Supplementary Material: pdf
Primary Area: foundation or frontier models, including LLMs
Submission Number: 6519
Loading