Object Hallucination Detection in Large Vision Language Models via Evidential Conflict

Published: 01 Jan 2024, Last Modified: 05 Mar 2025BELIEF 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Despite their remarkable ability to understand both textual and visual data, large vision-language models (LVLMs) still face issues with hallucination. This is particularly presented as the object hallucination, where the models inaccurately describe objects in the images. Current efforts mainly focus on detecting such erroneous behaviors through the semantic consistency of outputs via multiple inferences or by evaluating the entropy-based uncertainty of predictions. However, the former is resource-intensive, while the latter is often considered a less precise measure due to generally recognized overconfident predictions. To address the issue, we propose an object hallucination detection method based on evidential conflict. To be specific, we view the features in the last layer of the transformer decoder as evidence. Then, we combine the evidence based on Dempster’s rule, following the approach presented in the work [6]. Hence, this enables us to detect hallucinations by evaluating the conflict among evidence. Preliminary experiments were conducted on a state-of-the-art LVLM, mPLUG-Owl2. Results show that our approach exhibits an enhancement over baseline methods, particularly in cases with highly uncertain inputs.
Loading