AIGID-RFT: Reinforcement Fine-Tuning Multimodal LLMs for AI-Generated Image Detection

ICLR 2026 Conference Submission1516 Authors

03 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multimedia Forensics, MLLM
Abstract: The rapid progress of generative artificial intelligence has made AI-generated image (AIGI) detection increasingly critical for digital forensics and trustworthy media. Existing AIGI detectors are effective on raw images but lack robustness against post-processing operations. Meanwhile, multimodal large language models (MLLMs) have demonstrated strong general capabilities, but their direct application to AIGI detection remains limited. To address these challenges, we propose AIGID-RFT, a novel MLLM-based AI-generated image detector. Unlike prior methods that rely on supervised fine-tuning, we adopt reinforcement learning as the post-training paradigm and design verifiable rewards tailored for the AIGI detection task, thereby unlocking the intrinsic potential of MLLMs. Furthermore, we carefully design a Cross Layer Forensic Adapter, which is integrated in parallel with the vision encoder to effectively exploit multi-level visual features for enhanced detection performance. Our method requires only binary labels for training, eliminating the need for costly text annotations. Extensive experiments demonstrate that our method significantly outperforms existing AIGI detectors under diverse post-processing operations that simulate real-world scenarios.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 1516
Loading