RAID: A Benchmark Dataset for Testing the Adversarial Robustness of AI-Generated Image Detectors

20 Sept 2025 (modified: 21 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: AI-generated image detection, adversarial robustness
Abstract: AI-generated images have reached a quality level at which humans are incapable of reliably distinguishing them from real images. To counteract the inherent risk of fraud and disinformation, the detection of AI-generated images is a pressing challenge and an active research topic. While many of the presented methods claim to achieve high detection accuracy, they are usually evaluated under idealized conditions. In particular, the $\textit{adversarial robustness}$ is often neglected, potentially due to a lack of awareness or the substantial effort required to conduct a comprehensive robustness analysis. In this work, we tackle this problem by providing a simpler means to assess the robustness of AI-generated image detectors. We present $\textbf{RAID}$ ($\textbf{R}$obust evaluation of $\textbf{AI}$-generated image $\textbf{D}$etectors), a benchmark dataset of 72k diverse and highly transferable adversarial examples. The proposed dataset is created by running attacks against an ensemble of seven state-of-the-art detectors and images generated by four different text-to-image models. Extensive experiments show that our methodology generates adversarial images that transfer with a high success rate to unseen detectors, which can be used to quickly provide an approximate yet still reliable estimate of a detector's adversarial robustness. Our findings indicate that current state-of-the-art AI-generated image detectors can be easily deceived by adversarial examples, highlighting the critical need for the development of more robust methods.
Supplementary Material: zip
Primary Area: datasets and benchmarks
Submission Number: 23996
Loading