TL;DR: A generalizable AI-generated image detection method with pixelwise decomposition residuals is proposed.
Abstract: Fake images, created by recently advanced generative models, have become increasingly indistinguishable from real ones, making their detection crucial, urgent, and challenging. This paper introduces PiD (Pixelwise Decomposition Residuals), a novel detection method that focuses on residual signals within images. Generative models are designed to optimize high-level semantic content (principal components), often overlooking low-level signals (residual components). PiD leverages this observation by disentangling residual components from images, encouraging the model to uncover more underlying and general forgery clues independent of semantic content. Compared to prior approaches that rely on reconstruction techniques or high-frequency information, PiD is computationally efficient and does not rely on any generative models for reconstruction. Specifically, PiD operates at the pixel level, mapping the pixel vector to another color space (e.g., YUV) and then quantizing the vector. The pixel vector is mapped back to the RGB space and the quantization loss is taken as the residual for AIGC detection. Our experiment results are striking and highly surprising: PiD achieves 98% accuracy on the widely used GenImage benchmark, highlighting the effectiveness and generalization performance.
Lay Summary: We try to teach the neural network to distinguish fake and real images without relying on the obvious content flaws in this work. Fake images are generated from advanced generative models, like the frequently used Stable Diffusion and Flux models. The quality of generated images varies; some look very similar to the real images, while others may have different content flaws (like six fingers in a human image). Directly distinguishing fake images with content flaws or artifacts is easy for both AI and humans. However, we hope that AI or neural networks can also deal with hard cases that have no obvious flaws.
Therefore, we propose to change the input to the neural networks and design a simple way that remove the content flaws from the image. While the rest component of the image cannot be easily classified by humans, we take it as the input to the neural network and explore whether the network can use it in the generalized AIGC detection. Interestingly, we found that this simple way effectively improves the ability of neural networks to distinguish fake images from different generative models. The method can be easily integrated and helps avoid the malicious application of generative models.
Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.
Primary Area: Applications->Computer Vision
Keywords: AIGC detection, Generative models, Deepfakes, Deep learning, Image representation, Low-level vision
Submission Number: 15311
Loading