PatchTrust: Black-Box Hallucination Detection via Patch-Level Retrieval Scoring

VAIBHAV VARSHNEY; Manjunatha Naik MC

PatchTrust: Black-Box Hallucination Detection via Patch-Level Retrieval Scoring

VAIBHAV VARSHNEY, Manjunatha Naik MC

Published: 21 Apr 2026, Last Modified: 22 Apr 2026TrustVLMEveryoneRevisionsBibTeXCC BY 4.0

Keywords: vision-language models, hallucination detection, late interaction, multi-vector retrieval, trustworthy AI

Abstract: Vision-language models frequently hallucinate objects and attributes that lack grounding in the source image. Most detection methods require white-box access to model internals or task-specific training data, limiting their use with API-served or rapidly evolving VLMs. We observe that retrieval relevance scores, designed to measure query-document match, transfer directly to measuring claim-image grounding. PatchTrust applies this insight: it decomposes a VLM response into atomic claims and scores each claim against the source image using the late interaction MaxSim mechanism from a frozen ColPali encoder. Each claim token is matched to its best-aligned image patch. Claims that find no strong patch-level support are flagged as hallucinations. PatchTrust requires no training and no access to model internals, yet consistently outperforms single-vector similarity baselines and closes the gap to white-box detectors across five evaluation settings and two VLMs.

Submission Number: 2

Loading