Keywords: Generated image detection, Machine unlearning, Diffusion models
Abstract: Robust detection of generated images is critical to counter the misuse of generative models. Existing methods primarily depend on learning from human-annotated training datasets, limiting their generalization to unseen distributions. In contrast, large-scale vision models (LVMs) pre-trained on web-scale datasets exhibit exceptional generalization power through exposure to diverse distributions, offering a transformative paradigm for this task. However, our experimental results reveal that LVMs pre-trained exclusively on natural images effectively capture the features of both natural and generated images to achieve comparably low loss, thereby failing to distinguish both types of images. This prompts a key question: *When and how do LVMs exhibit different behaviors when capturing features of natural and generated images?* This investigation reveals an insight: during unlearning, LVMs exhibit disparate forgetting dynamics with feature degradation for generated images escalating faster than natural ones. Inspired by the disparate dynamics, we introduce two detection methods: 1) data-free detection, which prunes model parameters to induce unlearning without data access, and 2) data-driven detection, which optimizes LVMs to unlearn knowledge tied to generated images. Extensive experiments conducted on various benchmarks demonstrate that our unlearning-based approach outperforms conventional detection methods. By recasting the detection task as a problem of machine unlearning, our work establishes a new paradigm for generated image detection.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 16182
Loading