SemGIR: Semantic-Guided Image Regeneration based method for AI-generated Image Detection and Attribution

Xiao Yu; Kejiang Chen; Kai Zeng; Han Fang; Zijin Yang; Xiuwei Shang; Yuang Qi; Weiming Zhang; Nenghai Yu

SemGIR: Semantic-Guided Image Regeneration based method for AI-generated Image Detection and Attribution

Xiao Yu, Kejiang Chen, Kai Zeng, Han Fang, Zijin Yang, Xiuwei Shang, Yuang Qi, Weiming Zhang, Nenghai Yu

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: The rapid development of image generative models has lowered the threshold for image creation but also raised security concerns related to the propagation of false information, urgently necessitating the development of detection technologies for AI-generated images. Presently, text-to-image generation stands as the predominant approach to image generation, where the rendering of generated images hinges on two primary factors: text prompts and the inherent characteristics of the model. However, the variety of semantic text prompts yields diverse generated images, posing significant challenges to existing detection methodologies that rely solely on learning from image features, particularly in scenarios with limited samples. To tackle these challenges, this paper presents a novel perspective on the AI-generated image detection task, advocating for detection under semantic-decoupling conditions. Building upon this insight, we propose SemGIR, a semantic-guided image regeneration based method for AI-generated image detection. SemGIR first regenerates images through image-to-text followed by a text-to-image generation process, subsequently utilizing these re-generated image pairs to derive discriminative features. This regeneration process effectively decouples semantic features organically, allowing the detection process to concentrate more on the inherent characteristics of the generative model. Such an efficient detection scheme can also be effectively applied to attribution. Experimental findings demonstrate that in realistic scenarios with limited samples, SemGIR achieves an average detection accuracy 15.76\% higher than state-of-the-art (SOTA) methods. Furthermore, in attribution experiments on the SDv2.1 model, SemGIR attains an accuracy exceeding 98\%, affirming the effectiveness and practical utility of the proposed method.

Primary Subject Area: [Content] Vision and Language

Secondary Subject Area: [Content] Multimodal Fusion, [Generation] Generative Multimedia, [Generation] Social Aspects of Generative AI

Relevance To Conference: This article discusses the importance of semantic and image decoupling in the detection of generated images, offering a novel approach to joint detection across text and image modalities. It exhibits greater robustness and architectural universality in real-world scenarios.

Supplementary Material: zip

Submission Number: 1861

Loading