Image Tampering Localization Using Unified Two-Stream Features Enhanced with Channel and Spatial Attention
Abstract: Image tampering localization has attracted much attention in recent years. To differentiate between tampered and pristine image regions, many methods have increasingly leveraged the powerful feature learning ability of deep neural networks. Most of these methods operate on either spatial image domain directly or residual image domain constructed with high-pass filtering, while some take inputs from both domains and fuse the features just before making decisions. Though they have achieved promising performance, the gain of integrating feature representations of different domains is overlooked. In this paper, we show that learning a unified feature set is beneficial for tampering localization. In the proposed method, low-level features are firstly extracted from two input streams: one is a spatial image, and the other is a high-pass filtered residual image. The features are then separately enhanced with channel attention and spatial attention, and are subsequently subjected to an early-fusion to form a unified feature representation. The unified features play an important role under an adapted Mask R-CNN framework, achieving more accurate pixel-level tampering localization. Experimental results on five tampered image datasets have shown the effectiveness of the proposed method. The implementation is available at https://github.com/media-sec-lab/AEUF-Net .
0 Replies
Loading