M$^2$RL-Net: Multi-View and Multi-Level Relation Learning Network for Weakly-Supervised Image Forgery Detection

Jiafeng Li, Ying Wen, Lianghua He

Published: 24 Apr 2025, Last Modified: 24 Sept 2025Proceedings of the AAAI Conference on Artificial IntelligenceEveryoneCC BY 4.0

Abstract: As digital media manipulation becomes increasingly sophisticated, accurately detecting and localizing image forgeries with minimal supervision has become a critical challenge. Existing weakly supervised image forgery detection (W-IFD) methods often rely on convolutional neural networks (CNNs) and limited exploration of internal relationships, leading to poor detection and localization performance with only image-level labels. To address these limitations, we introduce a novel Multi-View and Multi-Level Relation Learning Network (M2RL-Net) for W-IFD. M2RL-Net effectively identifies forged images using only image-level annotations by exploring relationships between different views and hierarchical levels within images. This approach achieves patch-level self-consistency learning (PSL) and feature-level contrastive learning (FCL) across different views, facilitating more generalized self-supervised learning of forgery features. Specifically, PSL employs self-supervised learning to distinguish consistent and inconsistent regions within images, enhancing its ability to accurately locate tampered areas. FCL utilizes feature-level self-view and multi-view contrastive learning to differentiate between genuine and tampered image features, thereby improving the recognition of authentic and manipulated content across different views. Extensive experiments on various datasets demonstrate that M2RL-Net outperforms existing weakly-supervised methods in both detection and localization accuracy. This research sets a new benchmark for weakly-supervised image forgery detection and lays a robust foundation for future studies in this field.