Triggering Generative Collapse: A Contrastive Inversion Framework for AI-Generated Image Detection

Triggering Generative Collapse: A Contrastive Inversion Framework for AI-Generated Image Detection

ICLR 2026 Conference Submission7203 Authors

16 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: AI-Generated Image Detection, Image Forensics, Diffusion Model, Latent Space Perturbation, Contrastive Reconstruction, Generative Collapse, Out-of-Distribution Detection

Abstract: With the rapid evolution of generative models, AI-generated synthetic content has become increasingly realistic and difficult to detect, posing serious threats to information integrity. Unlike authentic samples that align with well-formed data distributions, manipulated samples often fall off the underlying data manifold. The foundational insight of our work is that a diffusion model's response to such off-manifold inputs can be deliberately engineered. We demonstrate that through targeted contrastive fine-tuning, subtle latent deviations consistently result in structured reconstructions failures. Leveraging this engineered sensitivity, we introduce the Contrastive Reconstruction Amplification Forensics Technique (CRAFT), a method that actively induces generative collapse for forgeries while maintaining faithful reconstruction for authentic content. The core of our approach is a contrastive reconstruction loss that fine-tunes the terminal stages of the DDIM denoising process. This optimization fundamentally alters the model's dynamics, engineering a deliberately fragile and asymmetric generative manifold: authentic latents are guided toward high-fidelity reconstructions, while any latent perceived as off-manifold is aggressively pushed toward a state of collapse. To further amplify this divergence, our framework then employs a class-guided latent perturbation that pushes all inputs away from the real-class center, effectively steering forged latents into the "failure regions" engineered by our fine-tuning. The result is a predictable structural collapse for forged samples into distinctive artifacts (e.g., honeycomb patterns), exposing a failure mode rooted in the model’s divergent Jacobian dynamics. This induced collapse yields a strong and interpretable authenticity cue that enables robust detection. Extensive experiments demonstrate that our method achieves superior performance and cross-domain generalization.

Primary Area: generative models

Submission Number: 7203

Loading