Bidirectional Cognitive Anchoring: Parameter-Efficient Hallucination Mitigation in Large Vision-Language Models
Abstract: Large Vision-Language Models (LVLMs) have made remarkable progress but remain prone to hallucinations, especially in complex reasoning and long-text generation. Existing inference-time interventions provide limited mitigation, as they primarily rely on unidirectional visual enhancement or static feature injection, and struggle to address persistent issues such as language prior dominance, visual amnesia, and instruction-irrelevant visual noise. To address these challenges, we propose **Bidirectional Cognitive Anchoring (BCA)**, a parameter-efficient framework that inserts lightweight **Cognitive Anchors (CogA)** into frozen Transformer decoding layers. BCA establishes an asymmetric bidirectional interaction mechanism: *Evidence Anchoring (V$\rightarrow$L)* continuously injects calibrated visual summaries into the language stream to mitigate visual amnesia in long-sequence generation, while *Questioning Anchoring (L$\rightarrow$V)* selectively filters instruction-irrelevant visual noise using linguistic hypotheses. A mask-aware training strategy further enables deep cross-modal calibration while preventing answer leakage. Extensive experiments on POPE, CHAIR, and MME show that BCA significantly reduces hallucinations, achieving up to **25%** lower CHAIR$_S$ and consistent gains in adversarial object verification, while preserving overall multimodal reasoning performance. Moreover, BCA incurs only marginal inference overhead, making it well-suited for practical and robust LVLM deployment.
Loading