Keywords: Image Steganography, Multimodal
Abstract: Due to the advances in deep learning and data accessibility, image steganography has become a critical and widely-used tool for information hiding. Image steganography mainly embeds and recovers secret data within cover images. With the increasing variety and volume of data, multi-modal secret data steganography is urgently required. However, the framework of existing image steganography often directly embeds multi-modal secret information into cover images in a modality-by-modality and sequential manner, leading to unsatisfactory steganography performance. This implies that current image steganography is a modal-specific framework, which is almost effective for hiding the specific modal secret data. **This paper presents a unified framework for multi-modal secret data steganography, which is capable of concurrently concealing image, text, and audio data within a cover image and permits reversible recovery**. However, two principal challenges arise: (1) The catastrophic forgetting seriously undermines the consistent performance across various modalities of secret data steganography; (2) The mitigation of catastrophic forgetting further induces significant interference originating from intra- and inter-modal information conflicts among distinct modal secret data and cover images, consequently compromising steganography fidelity. **To achieve coherent multimodal secret data knowledge preservation and interaction, our unified framework firstly establishes a coordinated coupling between steganography tasks and continual learning** to preserve learned multi-modal knowledge for maintaining model learning capacity and performance stability. **Subsequently, a Multi-Gap Collaborative Fusion mechanism utilizes cover images as anchors to effectively integrate multi-modal knowledge**, resolving intra- and inter-modal conflicts while bolstering security through directed secret data customization and encryption. Experiments demonstrate that our model can achieve secure and high-quality multi-modal secret data steganography, outperforming existing state-of-the-art (SOTA) methods.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 2698
Loading