Focus on What You See: Interpretable Vision-aware Latent Steering to Mitigate Object Hallucinations

Focus on What You See: Interpretable Vision-aware Latent Steering to Mitigate Object Hallucinations

ACL ARR 2026 January Submission3225 Authors

04 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Vision-language models, Object Hallucination, Model Interpretability, Visualization

Abstract: Large Vision-Language Models (LVLMs) have achieved remarkable success but continue to struggle with object hallucination (OH), generating outputs inconsistent with visual inputs. While previous work has proposed methods to reduce OH, the visual decision-making mechanisms that lead to hallucinations remain poorly understood. In this paper, we propose VaLSe, a Vision-aware Latent Steering framework that mitigates OH through an interpretation-then-mitigation pipeline. VaLSe performs token-level visual attribution to trace how visual inputs contribute to individual output tokens, producing visual contribution maps that highlight the image regions most responsible for the generated words. Then, by performing inference-time latent steering guided by token-level indicators of visual support derived from these contribution maps, VaLSe realigns internal representations toward semantically relevant content, increasing reliance on visually grounded signals and thereby reducing OH in outputs. Experiments on multiple LVLMs and object hallucination benchmarks show that VaLSe consistently reduces OH while preserving generation quality. Additional analysis identifies recurring visually unsupported activations during decoding, suggesting limitations of existing hallucination evaluation metrics.

Paper Type: Long

Research Area: Safety and Alignment in LLMs

Research Area Keywords: Safety and alignment

Languages Studied: English

Submission Number: 3225

Loading