FADE: Mitigating Hallucinations by Reducing Language Priors Dominance in Large Vision-Language Models
Keywords: Large Vision-Language Models, Hallucination, Language Priors Dominance, Training-Free
Abstract: Despite the impressive capabilities of Large Vision-Language Models (LVLMs), they remain susceptible to hallucination—generating content inconsistent with the input image. Recent studies attribute this to the dominance of language priors over visual inputs and employ contrastive decoding methods to mitigate this dominance, but the mechanistic origin remains unexplored. We investigate the information flow through each transformer layer and find that attention modules consistently aggregate visual evidence, while FFN modules at critical layers act as the source of language priors. These priors can override visual evidence, causing correct predictions in intermediate layers to drift toward incorrect outputs. Based on this insight, we propose FADE (FFN Attenuation for DEcoding), a training-free method that attenuates FFN outputs to reduce language priors dominance. Evaluations on POPE, CHAIR and MME benchmarks across LLaVA-1.5, mPLUG-Owl2 and InstructBLIP show that FADE effectively mitigates hallucinations while preserving inference efficiency.
Paper Type: Long
Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond
Research Area Keywords: Large Vision-Language Models, Multimodal Learning, Hallucination, Decoding
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 2053
Loading