DMIL-Net: A Multi-View Fusion and Region Decoupling Network For Diffusion-Based Generative Image Forgery Localization

DMIL-Net: A Multi-View Fusion and Region Decoupling Network For Diffusion-Based Generative Image Forgery Localization

ICLR 2026 Conference Submission22504 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Generative Image Forgery Localization; Image Manipulation Detection; Forgery Image Dataset; Diffusion Model

TL;DR: We introduce DMIL-Net to effectively localize generative forgeries produced by diffusion-based image inpainting through multi-view feature fusion and hierarchical region decoupling.

Abstract: With the increasing application of image generation technology in artistic creation and image editing, its potential for misuse in image forgery has also become increasingly prominent, posing new challenges to verifying image authenticity. In response to this issue, we propose the DMIL-Net. Specifically, we first design a multi-view feature learning strategy combining RGB views, noise views, and high-frequency information to fully capture clues from forgery regions. Secondly, we introduce multi-level contrastive learning to capture long-term dependencies across different modalities, leading to better fusion of multi-view features. Finally, we propose a forgery region decoupling and integration strategy, which iteratively decouples and integrates the body region and detail region to generate complete and detail-accurate localization results. In addition, we construct the DMI dataset, which contains 50,000 generative forgery images created via five prevalent diffusion-based generative image forgery methods, to support model training and testing. Experimental results show that DMIL-Net outperforms five mainstream methods on localization performance, generalization, extensibility, and robustness.

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 22504

Loading