Abstract: The purpose of robust image watermarking is to embed a watermark into a carrier image in an invisible form and extract the watermark successfully even under noise interference conditions to achieve copyright confirmation and traceability. Although watermarking methods based on deep learning can improve the robustness by adding a noise simulation layer, few theoretical analyses of the codec structure have been conducted. Theoretical explainability is the theoretical basis for developing a network architecture, which plays a guiding role in network development. On the basis of the interpretability of convolutional networks, this paper analyzes the mathematical process of embedding and extracting watermarks in codecs and proposes a novel watermarking framework based on multi-layer watermark feature fusion. Specifically, the encoder can be a convolutional network structure of arbitrary depth, whereas the decoder needs only to adopt its corresponding deconvolution structure. To improve the quality and robustness of the generated watermarked image, the watermark is associated with an arbitrary layer feature space in the decoder. In the decoder, the network quickly converges to each original encoding feature space through the deconvolution structure, thus decoupling the watermark features. Finally, the watermark is extracted via the automatic fusion of multi-layer watermark features. The experimental results show that the proposed method is suitable for few-shot learning, and its invisibility, robustness and generalization performance on multiple datasets are significantly better than those of other advanced methods.
External IDs:doi:10.1109/tmm.2025.3543079
Loading