FADE: Mitigating Hallucinations by Reducing Language Priors Dominance in Large Vision-Language Models

FADE: Mitigating Hallucinations by Reducing Language Priors Dominance in Large Vision-Language Models

ACL ARR 2026 January Submission2053 Authors

01 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Vision-Language Models, Hallucination, Language Priors Dominance, Training-Free

Abstract: Despite the impressive capabilities of Large Vision-Language Models (LVLMs), they remain susceptible to hallucination—generating content inconsistent with the input image. Recent studies attribute this to the dominance of language priors over visual inputs and employ contrastive decoding methods to mitigate this dominance, but the mechanistic origin remains unexplored. We investigate the information flow through each transformer layer and find that attention modules consistently aggregate visual evidence, while FFN modules at critical layers act as the source of language priors. These priors can override visual evidence, causing correct predictions in intermediate layers to drift toward incorrect outputs. Based on this insight, we propose FADE (FFN Attenuation for DEcoding), a training-free method that attenuates FFN outputs to reduce language priors dominance. Evaluations on POPE, CHAIR and MME benchmarks across LLaVA-1.5, mPLUG-Owl2 and InstructBLIP show that FADE effectively mitigates hallucinations while preserving inference efficiency.

Paper Type: Long

Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond

Research Area Keywords: Large Vision-Language Models, Multimodal Learning, Hallucination, Decoding

Contribution Types: NLP engineering experiment

Languages Studied: English

Submission Number: 2053

Loading