Keywords: Hallucination Mitigation;Sparse Autoencoders;Large Vision–Language Models;Activation Steering
TL;DR: We show that noise perturbations disrupt monosemantic neuron activations and induce hallucinations, and propose Contrastive Neuron Steering (CNS) to amplify truth neurons while suppressing spurious ones for hallucination mitigation.
Abstract: Large vision-language models (LVLMs) have achieved impressive performance in multimodal understanding and generation, yet they remain prone to hallucinations, particularly object hallucinations where entities are described yet do not exist in the input image. Existing mitigation methods often focus on output-level adjustments, while the internal mechanisms driving hallucinations remain poorly understood. In this work, we adopt an internal representation-level perspective by introducing sparse autoencoders (SAEs) to decompose dense visual features into sparse monosemantic neurons for interpreting and steering LVLMs. Building on prior findings that injecting image noise exacerbates hallucinations, we further investigate how noise perturbations reshape internal representations, revealing that noise alters monosemantic neuron activations, disrupts visual semantics, and induces hallucinations. Furthermore, we show that manipulating specific neurons enables controllable influence over LVLM outputs. Based on these insights, we propose Contrastive Neuron Steering (CNS), which selectively amplifies truth neurons while suppressing perturbation-induced activations to mitigate hallucinations, and further enhances understanding of image-specific features through adaptive neuron constraints and always-on neuron suppression. Extensive experiments and analyses demonstrate that CNS effectively reduces hallucinations. Moreover, our CNS enables interpretable and controllable internal neuron-level interventions, providing both practical mitigation and mechanistic insights into how LVLMs encode and sometimes misrepresent visual information.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 7850
Loading