Advancing Quantization Steps Estimation : A Two-Stream Network Approach for Enhancing Robustness

Published: 20 Jul 2024, Last Modified: 31 Jan 2025MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: In Joint Photographic Experts Group (JPEG) image steganalysis and forensics, the quantization step can reveal the history of image operations. Several methods for estimating the quantization step have been proposed by researchers. However, existing algorithms fail to account for robustness, which limits the application of these algorithms. To solve the above problems, we propose a two-stream network structure based on Swin Transformer. The spatial domain features of JPEG images exhibit strong robustness but low accuracy. Conversely, frequency domain features demonstrate high accuracy but weak robustness. Therefore, we design a two-stream network with the multi-scale feature of Swin Transformer to extract spatial domain features with high robustness and frequency domain features with high accuracy, respectively. Furthermore, to adaptively fuse features in both the frequency domain and spatial domain, we design a Spatial-frequency Information Dynamic Fusion (SIDF) module to dynamically allocate weights. Finally, we modify the network from a regression model to a classification model to speed up convergence and improve the accuracy of the algorithm. The experimental results show that the accuracy of the proposed method is higher than 98% on clean images. Meanwhile, in robust environments, the algorithm proposed maintains an average accuracy of over 81%.
Primary Subject Area: [Experience] Multimedia Applications
Relevance To Conference: To the best of our knowledge, we first consider the problem of robustness of the estimation of the quantization steps. Existing algorithms have low accuracy on noisy images, which limits their application. To improve robustness, we propose a two-stream network structure based on Swin Transformer. The spatial domain features of JPEG images exhibit strong robustness but low accuracy. Conversely, frequency domain features demonstrate high accuracy but weak robustness. Therefore, we design a two-stream network with the multi-scale feature of Swin Transformer to extract spatial domain features with high robustness and frequency domain features with high accuracy, respectively.
Submission Number: 1701
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview