Improving Source Extraction with Diffusion and Consistency Models

Tornike Karchkhadze; Mohammad Rasool Izadi; Shuo Zhang

Improving Source Extraction with Diffusion and Consistency Models

Tornike Karchkhadze, Mohammad Rasool Izadi, Shuo Zhang

26 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: source extraction, consistency models, score-matching diffusion

TL;DR: We integrate a score-matching diffusion model with consistency distillation into a deterministic architecture for time-domain musical source extraction, significantly improving performance on the Slakh2100 dataset.

Abstract: In this work, we demonstrate the integration of a score-matching diffusion model into a deterministic architecture for time-domain musical source extraction, resulting in enhanced audio quality. To address the typically slow iterative sampling process of diffusion models, we apply consistency distillation and reduce the sampling process to a single step, achieving performance comparable to that of diffusion models, and with two or more steps, even surpassing them. Trained on the Slakh2100 dataset for four instruments (bass, drums, guitar, and piano), our model shows significant improvements across objective metrics compared to baseline methods. Sound examples are available at https://consistency-separation.github.io/.

Primary Area: applications to computer vision, audio, language, and other modalities

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 7938

Loading