Abstract: AI-generated images are increasingly prevalent on the web, raising concerns about the real-world applicability of detection methods. While current detectors perform well on benchmark datasets, they suffer significant performance degradation on real-world datasets. Misalignment within benchmark datasets, caused by discrepancies in how data from different classes are encoded or transformed, leads models to learn shortcuts. These shortcuts make detectors overly reliant on factors such as image compression, causing biased predictions of real-world images that inevitably undergo compression. In this work, we reveal the misalignment in widely used benchmark datasets and demonstrate that aligning datasets improves model robustness and generalizability. Additionally, we propose leveraging pre-trained visual encoders to further enhance performance in real-world scenarios. Our approach achieves significant performance gains, highlighting the importance of dataset alignment for real-world AI-generated image detection.
External IDs:doi:10.1145/3709022.3736541
Loading