Keywords: vision-language models, hallucination detection, late interaction, multi-vector retrieval, trustworthy AI
Abstract: Vision-language models frequently hallucinate objects and attributes that lack grounding in the source image. Most detection methods require white-box access to model internals or task-specific training data, limiting their use with API-served or rapidly evolving VLMs. We observe that retrieval relevance scores, designed to measure query-document match, transfer directly to measuring claim-image grounding. PatchTrust applies this insight: it decomposes a VLM response into atomic claims and scores each claim against the source image using the late interaction MaxSim mechanism from a frozen ColPali encoder. Each claim token is matched to its best-aligned image patch. Claims that find no strong patch-level support are flagged as hallucinations. PatchTrust requires no training and no access to model internals, yet consistently outperforms single-vector similarity baselines and closes the gap to white-box detectors across five evaluation settings and two VLMs.
Submission Number: 2
Loading