Keywords: Music Generation, Source Extraction, Inpainting
TL;DR: We propose unified latent diffusion framework for simultaneous music generation, source imputation, and query-driven arbitrary source extraction.
Abstract: We present MGE-LDM, a unified latent diffusion framework for simultaneous music generation, source imputation, and query-driven source separation.
Unlike prior approaches constrained to fixed instrument classes, MGE-LDM learns a joint distribution over full mixtures, submixtures, and individual stems within a single compact latent diffusion model.
At inference, MGE-LDM enables (1) complete mixture generation, (2)
partial generation (i.e., source imputation), and (3) text-conditioned extraction of arbitrary sources.
By formulating both separation and imputation as conditional inpainting tasks in the latent space, our approach supports flexible, class-agnostic manipulation of arbitrary instrument sources.
Notably, MGE-LDM can be trained jointly across heterogeneous multi-track datasets (e.g., Slakh2100, MUSDB18, MoisesDB) without relying on predefined instrument categories.
Track: Paper Track
Confirmation: Paper Track: I confirm that I have followed the formatting guideline and anonymized my submission.
Submission Number: 1
Loading