Mitigating Hallucination in Multimodal Reasoning via Functional Attention Control

Haolang Lu; Bolun Chu; WeiYe Fu; Guoshun Nan; Junning Liu; Minghui Pan; Qiankun Li; Yi Yu; Hua wang; Kun Wang

Mitigating Hallucination in Multimodal Reasoning via Functional Attention Control

Haolang Lu, Bolun Chu, WeiYe Fu, Guoshun Nan, Junning Liu, Minghui Pan, Qiankun Li, Yi Yu, Hua wang, Kun Wang

16 Sept 2025 (modified: 12 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multimodal Large Reasoning Models, Hallucination Mitigation, Reasoning

TL;DR: We propose a lightweight, plug-and-play method for mitigating hallucinations in multimodal large reasoning models (MLRMs) by decomposing perception and reasoning stages and regulating functional attention heads.

Abstract: Multimodal large reasoning models (MLRMs) are rapidly advancing vision-language reasoning and are emerging as a foundation for cross-modal intelligence. Hallucination remains a persistent failure mode, manifesting itself as erroneous reasoning chains and misinterpretation of visual content. In this study, we observe that attention heads exhibit a staged division: **shallow** heads predominantly serve perception, while **deeper** heads shift toward symbolic reasoning, revealing two major causes of hallucination, namely perceptual bias and reasoning drift. To address these issues, we propose a lightweight and interpretable two-step plugin, Functional Head Identification and Class-conditioned Rescaling, which locates perception- and reasoning-oriented heads and regulates their contributions without retraining. Evaluations on three real-world MLRMs (`Kimi-VL`, `Ocean-R1`, `R1-Onevision`), six benchmarks across three domains, and four baselines show that our plugin achieves an average improvement of **5\%** and up to **15\%**, with **only $<$1\% additional computation** and **9\%** of baseline latency. Our approach is completely model-agnostic and significantly enhances both the reliability and interpretability of the off-the-shelf MLRMs, thereby enabling their safe deployment in high-stakes applications. Our code is available at https://anonymous.4open.science/r/Functional-Attention-Control.

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 7208

Loading