Keywords: Large Vision-language models, Object Hallucination, Model Interpretability, Visualization
Abstract: Large Vision-Language Models (LVLMs) have achieved remarkable success but continue to struggle with object hallucination (OH), generating outputs inconsistent with visual inputs. While previous work has proposed methods to reduce OH, the visual decision-making mechanisms that lead to hallucinations remain poorly understood.
In this paper, we propose VaLSe, a Vision-aware Latent Steering framework that mitigates OH through an interpretation-then-mitigation pipeline. VaLSe performs token-level visual attribution to trace how visual inputs contribute to individual output tokens, producing visual contribution maps that highlight the image regions most responsible for the generated words.
Then, by performing inference-time latent steering guided by token-level indicators of visual support derived from these contribution maps, VaLSe realigns internal representations toward semantically relevant content, increasing reliance on visually grounded signals and thereby reducing OH in outputs.
Experiments on multiple LVLMs and object hallucination benchmarks show that VaLSe consistently reduces OH while preserving generation quality. Additional analysis identifies recurring visually unsupported activations during decoding, suggesting limitations of existing hallucination evaluation metrics.
Paper Type: Long
Research Area: Safety and Alignment in LLMs
Research Area Keywords: Safety and alignment
Languages Studied: English
Submission Number: 3225
Loading