CrossCheck: A Vision-Language Conflict Detection Benchmark

Teodora Popordanoska; Jiameng Li; Matthew B. Blaschko

CrossCheck: A Vision-Language Conflict Detection Benchmark

Teodora Popordanoska, Jiameng Li, Matthew B. Blaschko

13 Sept 2025 (modified: 14 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: conflict detection, multimodal llms, benchmark dataset

TL;DR: We present CrossCheck, a benchmark for multimodal conflict detection using images paired with controlled contradictory captions.

Abstract: Contradictory multimodal inputs are common in real-world settings, yet existing benchmarks typically assume input consistency and fail to evaluate cross-modal conflict detection -- a fundamental capability for preventing hallucinations and ensuring reliability. We introduce CrossCheck, a novel benchmark for multimodal conflict detection, featuring COCO images paired with contradictory captions containing controlled object--level or attribute--level conflicts. Each sample includes targeted questions evaluated in both multiple-choice and open-ended formats. The benchmark provides an extensive fine-tuning set filtered through automated quality checks, alongside a smaller human-verified diagnostic set. Our analysis of state-of-the-art models reveals substantial limitations in recognizing cross-modal contradictions, exposing systematic modality biases and category-specific weaknesses. Furthermore, we empirically demonstrate that targeted fine-tuning on CrossCheck substantially enhances conflict detection capabilities.

Primary Area: datasets and benchmarks

Submission Number: 4768

Loading