Investigating Translation Invariance and Shiftability in CNNs for Robust Multimedia Forensics: A JPEG Case Study

Edoardo Daniele Cannas, Sara Mandelli, Paolo Bestagini, Stefano Tubaro

Published: 01 Jan 2024, Last Modified: 05 Mar 2025IH&MMSec 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Convolutional Neural Networks (CNNs) have been the state of the art in many applications, including computer vision and multimedia forensics. Translation invariance is often included among the reasons for their success. However, the recent literature has shown that this characteristic does not always hold, proving that CNNs are instead sensitive to small input translations and rotations. This phenomenon has been demonstrated for standard computer vision tasks like object classification, but the multimedia forensics literature has never investigated it. Forensic footprints are usually more subtle and prone to be conceived by post-processing operations, however showing other appealing properties, like periodicity patterns that can be exploited by analysts to deal with forensic tasks. An example is provided by JPEG compression, whose spatial periodicity is a clue for reconstructing the lifecycle of digital pictures. In this paper, we show that the translation invariance properties of CNNs are in strict relation with the intrinsic periodicity of input data, exploring the particular case of JPEG compressed images. Specifically, we test how CNNs change their behavior when processing compressed images whose pixels are misaligned with respect to the standard 8x8 JPEG grid and investigate solutions to mitigate these changes. Our results highlight some interesting relations between the properties of JPEG and CNNs' hyperparameters, like the stride of the first convolutional layer.