DOBB: Decoupled Object-level Bridging with von Mises–Fisher Uncertainty for Hallucination Detection in MLLMs
Keywords: hallucination detection; object-level bridging; von Mises–Fisher distribution
TL;DR: This paper proposes a hallucination detection method that incorporates an object-level bridging mechanism based on MLLMs and leverages the mean resultant length to measure uncerntianty inspired by the von Mises–Fisher distribution.
Abstract: Multimodal Large Language Models (MLLMs) have advanced many vision-language tasks, but they still produce hallucinations (i.e., assertions inconsistent with the image or facts), undermining reliability in high-risk applications. Existing detection approaches typically feed images and texts jointly and estimate the hallucination scores from model outputs. However, because the visual module often lags behind the language module in understanding and analysis, the models can repeatedly produce similar but incorrect answers, yielding low measured scores and missed detections. To address this issue, we propose a simple yet effective model-agnostic method, dubbed Decoupled OBject-level Bridging method(DOBB), which i) elicits richer and object-aware responses by decoupling object recognition from relational reasoning via a two-step prompting (an Object-Level Bridging strategy, OLB), and ii) measures uncertainty with a von Mises–Fisher (vMF)-inspired metric (i.e., the mean resultant vector length), which is more stable than semantic-entropy-based metrics under small sample regimes. To be specific, OLB first prompts the model to list recognized objects, then asks the language model to examine and reason about the image using a chain-of-thought approach conditioned on the objects, yielding a richer answer space by sampling object-bridged responses. To accurately detect hallucinations, we replace the conventional entropy-based uncertainty measures with the mean resultant length from the vMF distribution, which is robust even under low-sample regimes and exhibits smoother behavior compared with entropy-based methods, thus avoiding the instability of existing methods. Extensive experiments and ablation studies across multiple benchmarks demonstrate that DOBB consistently outperforms state-of-the-art baselines, offering a robust and generalizable solution for hallucination detection in MLLMs.
Primary Area: neurosymbolic & hybrid AI systems (physics-informed, logic & formal reasoning, etc.)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2026/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 24364
Loading