AI’s Visual Blind Spot: Benchmarking MLLMs on Visually Smuggled Threats

10 Sept 2025 (modified: 06 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: MLLMs;Visually Smuggled Threats;MLLMs Safety;OCR
Abstract: Visual Smuggling Threats (VSTs) spread illicit information by embedding concealed or encrypted text within seemingly innocuous images, adversarially evading automated moderation and proliferating across online platforms, while the effectiveness of recent Multimodal Large Language Models (MLLMs) in identifying VSTs to safeguard online security remains underexplored. To bridge this gap, we construct VST-Bench, a benchmark for comprehensively evaluating models’ ability to detect diverse VSTs. It encompasses three major challenges, i.e., Perceptual Difficulty, Reasoning Traps, and AI Illusion, which are further divided into ten subcategories, and includes 3,400 high-quality samples collected from real smuggling scenarios or synthesized by replicating smuggling workflows. Evaluation of 29 mainstream MLLMs on VST-Bench shows that existing models perform poorly in judging violative images. The state-of-the-art open-source model Gemma-3-27B achieves only 32.67% F1 on the challenging AI Blended Background category, and even the proprietary Gemini-2.5 Pro reaches just 46.32%, indicating that current MLLMs are far from reliably preventing the spread of harmful content in real-world deployment. Through an in-depth analysis of failure cases, we discover three core challenges posed by VSTs: (1) Perceptual Failure on Subtle Threats, (2) Reasoning Failure on Semantic Puzzles, and (3) Recognition Failure against AI Illusions. We will release the dataset and evaluation code of VST-Bench to facilitate further research on VST and the broader online risk content recognition.
Primary Area: datasets and benchmarks
Submission Number: 3572
Loading