Promising Multi-Granularity Linguistic Steganography by Jointing Syntactic and Lexical Manipulations

Published: 2025, Last Modified: 06 Jan 2026AAAI 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Existing modification-based linguistic steganography methods primarily perform linguistic manipulations within a single embedding space to conceal secret information. However, these methods are stringently constrained by the original semantics of the cover text, making it struggle to achieve a satisfactory embedding capacity in a single embedding space. In this paper, we propose a novel Multi-granularity Modification-based Linguistic Steganography framework (MMLS) that hides secret information in both syntactic space and symbolic space, enhancing syntactic naturalness and semantic coherence while further increasing embedding capacity. Specifically, MMLS utilizes a paraphrase generation model to automatically modify the syntactic structure of the given original sentence, which enables the generation of paraphrases and the preservation of semantics simultaneously. Moreover, MMLS employs a distance-aware syntactic bins coding strategy to embed part of secret information into the syntactic space. This strategy utilizes a cluster-based way to partition the implicit syntactic space into a finite number of separate zones, thus increasing the number of candidate paraphrases and avoiding the selection of semantically distorted steganographic texts. Finally, the pre-trained BERT is used to replace some words in candidate paraphrases with their synonyms. Such a design embeds the remaining secret information into symbolic space while ensuring syntactic and semantic naturalness. Experimental results demonstrate that MMLS significantly outperforms existing methods in terms of semantic coherence, embedding capacity, and security.
Loading