AIGID-RFT: Reinforcement Fine-Tuning Multimodal LLMs for AI-Generated Image Detection

Zheming Fan; Guopu Zhu; Zixuan Yu; Shen Wang; Ligang Wu

AIGID-RFT: Reinforcement Fine-Tuning Multimodal LLMs for AI-Generated Image Detection

Zheming Fan, Guopu Zhu, Zixuan Yu, Shen Wang, Ligang Wu

03 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multimedia Forensics, MLLM

Abstract: The rapid progress of generative artificial intelligence has made AI-generated image (AIGI) detection increasingly critical for digital forensics and trustworthy media. Existing AIGI detectors are effective on raw images but lack robustness against post-processing operations. Meanwhile, multimodal large language models (MLLMs) have demonstrated strong general capabilities, but their direct application to AIGI detection remains limited. To address these challenges, we propose AIGID-RFT, a novel MLLM-based AI-generated image detector. Unlike prior methods that rely on supervised fine-tuning, we adopt reinforcement learning as the post-training paradigm and design verifiable rewards tailored for the AIGI detection task, thereby unlocking the intrinsic potential of MLLMs. Furthermore, we carefully design a Cross Layer Forensic Adapter, which is integrated in parallel with the vision encoder to effectively exploit multi-level visual features for enhanced detection performance. Our method requires only binary labels for training, eliminating the need for costly text annotations. Extensive experiments demonstrate that our method significantly outperforms existing AIGI detectors under diverse post-processing operations that simulate real-world scenarios.

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 1516

Loading