CoReVe: Mitigating Object Hallucinations in Large Vision-Language Models via Chain-of-Region Verification
Keywords: Hallucination Mitigation, Large Vision-Language Models, Chain-of-Region Verification
TL;DR: We propose a region-aware visual chain-of-verification method to mitigate object hallucinations in LVLMs.
Abstract: Large vision-language models (LVLMs) have demonstrated impressive performance in various multimodal understanding and reasoning tasks. However, they still struggle with object hallucinations, i.e., the claim of nonexistent objects in the visual input. To address this challenge, we propose Chain-of-Region Verification (CoReVe), a region-aware visual chain-of-verification method to mitigate object hallucinations in LVLMs in a post-hoc manner. Motivated by how humans comprehend intricate visual information---often focusing on specific image regions or details within a given sample---we elicit such region-level processing from LVLMs and use it as a chaining cue to detect and mitigate object hallucinations. Specifically, our CoReVe consists of six steps: initial response generation, entity extraction, coordinate generation, region description, verification execution, and final response generation. As a simple yet effective method, CoReVe can be seamlessly integrated into various LVLMs in a training-free manner and without relying on external detection models. Extensive experiments on four hallucination benchmarks across four LVLMs demonstrate that CoReVe can significantly alleviate hallucinations in LVLMs. Code will be released to facilitate future research.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 6314
Loading