Abstract: As manipulating images may lead to misinterpretation of the visual content, addressing the image forgery detection and localization (IFDL) problem has been becoming serious public concerns. In this work, we propose a simple assumption that the effective forensic method should focus on the mesoscopic properties of images. Inspired by this, a novel two-stage self-supervised framework based on the diffusion model for IFDL task, \ie, DiffForensics, is proposed in this paper. The DiffForensics begins with self-supervised denoising diffusion paradigm equipped with the module of encoder-decoder structure, by freezing the pre-trained encoder (\eg, in ADE-20K) to inherit macroscopic features for general image characteristics, while encouraging the decoder to learn microscopic feature representation of images, to make the whole model focus the mesoscopic representations. The pre-trained model as a prior, is then further fine-tuned for IFDL task with the customized Edge Cue Enhancement Module (ECEM), which progressively highlights the boundary features within the manipulated regions, thereby refining tampered area localization with greater precision. Extensive experiments on several public challenging datasets demonstrate the effectiveness of the proposed method compared with other state-of-the-art methods. The proposed DiffForensics could significantly improve the model’s capabilities for both accurate tamper detection and precise tamper localization while concurrently elevating its generalization and robustness.
Loading