DMIL-Net: A Multi-View Fusion and Region Decoupling Network For Diffusion-Based Generative Image Forgery Localization

20 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Generative Image Forgery Localization; Image Manipulation Detection; Forgery Image Dataset; Diffusion Model
TL;DR: We introduce DMIL-Net to ​​effectively​​ localize generative forgeries produced by diffusion-based image inpainting through multi-view feature fusion and hierarchical region decoupling.
Abstract: The iteration and popularization of diffusion models have significantly lowered the barrier to high-quality image forgery, posing severe challenges to image authenticity forensics. Addressing local forgery based on diffusion models, this paper proposes a forgery localization method named DMIL-Net. Specifically, we first design a multi-view feature learning strategy that integrates RGB views, noise views, and high-frequency views, aiming to capture diffusion model-specific edge fusion artifacts and denoising artifacts generated during the generation process, thereby providing clues for accurate localization. Secondly, considering the inherent differences in artifact features between the main content regions and edge detail regions generated by diffusion models, we propose a tampered region decoupling and integration strategy. This strategy iteratively decouples and integrates the main regions and detail regions to achieve more precise localization. In addition, we construct the DMI dataset, which contains 50,000 generative forgery images created via five prevalent diffusion-based generative image forgery methods, to support model training and testing. Experimental results show that DMIL-Net outperforms five mainstream methods on localization performance, generalization, extensibility, and robustness.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 22504
Loading