Zero-Shot Unsupervised and Text-Based Audio Editing Using DDPM Inversion

Published: 02 May 2024, Last Modified: 25 Jun 2024ICML 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Editing signals using large pre-trained models, in a zero-shot manner, has recently seen rapid advancements in the image domain. However, this wave has yet to reach the audio domain. In this paper, we explore two zero-shot editing techniques for audio signals, which use DDPM inversion with pre-trained diffusion models. The first, which we coin *ZEro-shot Text-based Audio (ZETA)* editing, is adopted from the image domain. The second, named *ZEro-shot UnSupervized (ZEUS)* editing, is a novel approach for discovering semantically meaningful editing directions without supervision. When applied to music signals, this method exposes a range of musically interesting modifications, from controlling the participation of specific instruments to improvisations on the melody. Samples and code can be found on our [examples page](https://hilamanor.github.io/AudioEditing/).
Submission Number: 911
Loading