SemGIR: Semantic-Guided Image Regeneration based method for AI-generated Image Detection and Attribution

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: The rapid development of image generative models has lowered the threshold for image creation but also raised security concerns related to the propagation of false information, urgently necessitating the development of detection technologies for AI-generated images. Presently, text-to-image generation stands as the predominant approach to image generation, where the rendering of generated images hinges on two primary factors: text prompts and the inherent characteristics of the model. However, the variety of semantic text prompts yields diverse generated images, posing significant challenges to existing detection methodologies that rely solely on learning from image features, particularly in scenarios with limited samples. To tackle these challenges, this paper presents a novel perspective on the AI-generated image detection task, advocating for detection under semantic-decoupling conditions. Building upon this insight, we propose SemGIR, a semantic-guided image regeneration based method for AI-generated image detection. SemGIR first regenerates images through image-to-text followed by a text-to-image generation process, subsequently utilizing these re-generated image pairs to derive discriminative features. This regeneration process effectively decouples semantic features organically, allowing the detection process to concentrate more on the inherent characteristics of the generative model. Such an efficient detection scheme can also be effectively applied to attribution. Experimental findings demonstrate that in realistic scenarios with limited samples, SemGIR achieves an average detection accuracy 15.76\% higher than state-of-the-art (SOTA) methods. Furthermore, in attribution experiments on the SDv2.1 model, SemGIR attains an accuracy exceeding 98\%, affirming the effectiveness and practical utility of the proposed method.
Primary Subject Area: [Content] Vision and Language
Secondary Subject Area: [Content] Multimodal Fusion, [Generation] Generative Multimedia, [Generation] Social Aspects of Generative AI
Relevance To Conference: This article discusses the importance of semantic and image decoupling in the detection of generated images, offering a novel approach to joint detection across text and image modalities. It exhibits greater robustness and architectural universality in real-world scenarios.
Supplementary Material: zip
Submission Number: 1861
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview