Keywords: Large Vision Language Model, Hallucination, Synthetic Data, Preference Alignment
Abstract: Large Vision-Language Models (LVLMs) have achieved impressive performance across various vision-language tasks. However, hallucinations, i.e., generating counterfactual responses, remain a significant challenge. Although recent models have mitigated hallucinations in tasks such as object existence and image description, they primarily address hallucinations in response generation while overlooking the task question itself. This paper highlights the vulnerability of LVLMs in solving fictitious presupposition questions (FPQs), where the models are prone to accept the presuppositions of non-existent objects and produce severe hallucinatory responses. To this end, we first introduce a novel benchmark, VFP-Bench, to evaluate LVLMs' capability to discriminate fictitious presuppositions and generate factual responses. Moreover, we introduce Antidote, a universal, synthetic data-driven self-correction solution for alleviating hallucination in FPQs and conventional tasks. It leverages synthetic data to incorporate factual priors into questions/queries to achieve self-correction, decoupling hallucination alleviation into a preference optimization problem. Applied to the LLaVA series, it enhances performance on VFP-Bench by over 50%, POPE by 1.8–3.3%, and CHAIR \& SHR by 30–50%, without relying on external supervision from stronger LVLMs or human feedback and introducing noticeable catastrophic forgetting issues.
Primary Area: foundation or frontier models, including LLMs
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 5568
Loading