Abstract: Existing deep learning-based image steganography schemes mostly neglect the latent space of images. These schemes merely adopt simple concatenation of image feature vectors, resulting in a low utilization rate of features, low steganographic image quality, and poor image robustness. This paper introduces the latent diffusion model into an image steganography scheme, namely SteDM. The SteDM firstly uses an encoder to transform the cover image and the secret image into the latent space, then employing a cross-attention mechanism to fuse them during the inverse diffusion process. Then we use a decoder to obtain a steganographic image containing secret image features. During the extraction process, the latent space-based diffusion model is similarly employed. Training loss is defined as a joint optimization of the autoencoder and diffusion model during the training process. Experimental results demonstrate that the SteDM outperforms existing steganography schemes in some aspects such as visual effects, security, and robustness.
External IDs:doi:10.1007/978-981-96-1551-3_1
Loading