Don't Miss the Forest for the Trees: Attentional Vision Calibration for Large Vision Language Models

27 Sept 2024 (modified: 14 Nov 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Vision Language Models
TL;DR: Reducing LVLM attentional bias helps in mitigating hallucinations
Abstract: This study seeks to understand and address a phenomenon observed in Large Vision Language Models (LVLMs) related to their attention mechanism. Interestingly, LVLMs tend to disproportionately focus on a few image tokens that lack meaningful, query-related semantics, leading to sharp outlier values in the attention maps — tokens we refer to as blind tokens. In well-designed attention mechanisms, the principle is to assign higher weights to the most relevant tokens. However, in this case, the attention imbalance leads to overemphasis on uninformative tokens, which is far from ideal. Our analysis shows that tokens receiving lower attention weights often hold critical information necessary for capturing subtle visual details. We hypothesize that over-reliance on blind tokens contributes to hallucinations in LVLMs. To address this, we introduce a novel decoding technique called Attentional Vision Calibration (AVISC). During the decoding phase, AVISC identifies blind tokens by examining the image-wise attention distribution and dynamically adjusts the logits for the prediction. Specifically, it contrasts the logits conditioned on the original visual tokens with those conditioned on the blind tokens, thereby reducing the model’s dependency on blind tokens and encouraging a more balanced consideration of all visual tokens. We validate AVISC on standard hallucination benchmarks, including POPE, MME, and AMBER, where it consistently outperforms existing decoding techniques.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 9071
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview