Don’t Lag, RAG: Training-Free Adversarial Detection Using RAG

13 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multimodal Vision, Image Retrieval, Security, Language and Vision, Adversarial Patch
TL;DR: We propose a training-free retrieval-augmented framework that uses vision-language models to detect diverse adversarial patches with state-of-the-art accuracy.
Abstract: Adversarial patch attacks pose a major threat to vision systems by embedding localized perturbations that mislead deep models. Traditional defense methods often require retraining or fine-tuning, making them impractical for real-world deployment. We propose a training-free Visual Retrieval-Augmented Generation (VRAG) framework that integrates Vision-Language Models (VLMs) for adversarial patch detection. By retrieving visually similar patches and images that resemble stored attacks in a continuously expanding database, VRAG performs generative reasoning to identify diverse attack types - all without additional training or fine-tuning. We extensively evaluate open-source large-scale VLMs - including Qwen-VL-Plus, Qwen2.5-VL-72B, and UI-TARS-72B-DPO - alongside Gemini-2.0, a closed-source model. Notably, the open-source UI-TARS-72B-DPO model achieves up to 95% classification accuracy, setting a new state-of-the-art for open-source adversarial patch detection. Gemini-2.0 attains the highest overall accuracy, 98%, but remains closed-source. Experimental results demonstrate VRAG’s effectiveness in identifying a variety of adversarial patches with minimal human annotation, paving the way for robust, practical defenses against evolving adversarial patch attacks.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 4696
Loading