Abstract: Adversarial images which can fool deep neural networks attract researchers’ attentions to the security of machine learning. In this paper, we employ a blind forensic method to detect adversarial images which are generated by the gradient-based attacks including FGSM, BIM, RFGSM and PGD. Through analyzing adversarial images, we find out that the gradient-based attacks cause significant statistical changes in the image difference domain. Besides, the gradient-based attacks add different perturbations on R, G, B channels, which inevitably change the dependencies among R, G, B channels. To measure those dependencies, the \(3^{rd}\)-order co-occurrence is employed to construct the feature. Unlike previous works which extract the co-occurrence within each channel, we extract the co-occurrences across from the \(1^{st}\)-order difference of R, G, B channels to capture the inter dependence changes. Due to the shift of difference elements caused by attacks, some co-occurrence elements of the adversarial images have distinct larger values than those of legitimate images. Experimental results demonstrate that the proposed method performs stable for different attack types and different attack strength, and achieves detection accuracy up to 99.9% which exceeding state-of-the-art much.
Loading