Abstract: Voice-over-IP (VoIP) steganography has matured into a critical branch of information hiding, yet nearly all prior work targets traditional codecs such as Adaptive Multi-Rate and G.723. In contrast, OPUS —current de-facto codec standard in mainstream voice platforms — remains largely unexplored for VoIP steganography. To address this gap, this paper presents a systematic and comprehensive investigation of steganography in OPUS speech streams. Our study begins with an analysis of the steganographic performance in the pulse parameter domain, identifying advantages such as non-continuity and a preference for silent frames. Building on this analysis, we propose two novel steganographic approaches: an inter-frame technique featuring a deep reinforcement learning-based position selection method, and an intra-frame technique relying on pulse modulation. The position selection method formulates speech steganography as a continuous control task, capitalizing on the inherent insensitivity of silent speech for information embedding and globally optimizing embedding positions through deep reinforcement learning. The pulse modulation method, on the other hand, embeds information by exploiting pulse randomness and their tendency toward zero. Experimental results demonstrate that the pulse modulation method achieves excellent steganographic transparency and capacity, while the position selection method significantly improves the steganographic transparency of various intra-frame steganographic techniques. It is noteworthy that this study not only pioneers the field of information hiding in OPUS speech streams but also marks the first successful application of deep reinforcement learning to speech steganography. The proposed methods are applicable not only to OPUS but can also be extended to other modern codecs such as SILK.
External IDs:doi:10.1109/taslpro.2025.3624951
Loading