VIBRA: Redundancy-Aware Information Bottleneck for Hallucination-Resistant Vision-Language Models

Zijian Song; Feiyang Chen; Kun He

VIBRA: Redundancy-Aware Information Bottleneck for Hallucination-Resistant Vision-Language Models

Zijian Song, Feiyang Chen, Kun He

16 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: vision-language models;visual hallucination

Abstract: Vision-Language Models (VLMs) have achieved impressive progress across a range of multimodal tasks but remain highly susceptible to visual hallucination, producing text that contradicts the visual input. Existing mitigation strategies often rely on additional large-scale VLMs or multi-stage decoding, which hinders efficiency and broad applicability. In this work, we identify redundant and noisy image features as a primary cause of hallucination, as they degrade the model’s ability to capture semantically relevant visual content. Correspondingly, we propose VIBRA (Vision-Language Information Bottleneck with Redundancy Awareness), a lightweight and plug-and-play module that adaptively filters out redundant visual information while preserving task-relevant semantics at both the token and feature levels. Specifically, VIBRA employs a multi-modal information bottleneck to retain image features aligned with textual input and introduces adaptive token filtering through spectral clustering and compression-aware pruning to eliminate instance-specific redundancy. Additionally, we design a Binary-Guided loss to sharpen the separation between informative and noisy features, enabling more effective visual information gating. Extensive experiments demonstrate that VIBRA consistently enhances visual reasoning and reduces hallucination across a variety of VLM architectures.

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 6822

Loading