GenMark: An Embedded Watermarking Scheme for Generative Audio Synthesis

ICLR 2026 Conference Submission17117 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Neural Audio Generation, Audio Watermarking, Robust Watermark Embedding, Speech Synthesis
TL;DR: GenMark is a training-time audio watermarking method that embeds watermarks directly into generative decoders, improving robustness against removal while preserving perceptual quality.
Abstract: Audio watermarking provides an effective approach for tracing and protecting synthetic audio content. Traditional methods often apply watermarking as a post-processing step, which makes the watermark vulnerable to removal or degradation through signal processing or code editing. To address these issues, our paper introduces GenMark, a novel approach that embeds watermarks directly into the decoder of neural audio generation models during training. Our approach combines time-frequency perceptual losses, a mask-based localization model, and adversarial training to ensure high audio quality and watermark robustness. Experimental results on speech and music generation tasks demonstrate superior detection accuracy (TPR: 99.9\% for speech, 100.0\% for music). GenMark also preserves perceptual quality with less than 2\% degradation in MUSHRA scores, establishing it as a strong candidate for practical and secure watermarking in generative audio systems.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 17117
Loading