Redefining Generalization in Visual Domains: A Two-Axis Framework for Fake Image Detection with FusionDetect

ICLR 2026 Conference Submission17186 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Synthetic Image Detection, Cross-Generator Generalization, Cross-Domain Generalization, Diffusion Models, Feature Fusion, Foundation Models, Benchmark Dataset
TL;DR: For AI-generated image detection, we introduce a more complete definition of “Generalization”, including two axes: cross-generator & cross-semantic. We provide an SOTA evaluation benchmark (OmniGen) and a novel fundamental detector (FusionDetect).
Abstract: The rapid development of generative models has made it increasingly crucial to develop detectors that can reliably detect synthetic images. Although most of the work has now focused on cross-generator generalization, we argue that this viewpoint is too limited. Detecting synthetic images involves another equally important challenge: generalization across visual domains. To bridge this gap, we present the OmniGen Benchmark. This comprehensive evaluation dataset incorporates 12 state-of-the-art generators, providing a more realistic way of evaluating detector performance under realistic conditions. In addition, we introduce a new method, FusionDetect, aimed at addressing both vectors of generalization. FusionDetect draws on the benefits of two frozen foundation models: CLIP & Dinov2. By deriving features from both complementary models, we develop a cohesive feature space that naturally adapts to changes in both the content and design of the generator. Our extensive experiments demonstrate that FusionDetect not only delivers a new state-of-the-art, which is 3.87% more accurate than its closest competitor and 6.13% more precise on average on established benchmarks, but also achieves a 4.48% increase in accuracy on OmniGen, along with exceptional robustness to common image perturbations. We introduce not only a top-performing detector, but also a new benchmark and framework for furthering universal AI image detection.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 17186
Loading