Keywords: Symbolic and Audio Music, Unified Latent Space, Band-Level Music Generation, Feature Extraction, Generative Models
TL;DR: We present UniComposer, a band-level music generation pipeline based on cascaded diffusion models, leveraging unified symbolic and audio features extracted from autoencoders.
Abstract: Multi-track deep music generation has largely focused on pre-specified structures and instruments. However, it remains a challenge to generate "band-level" full-length music that is capable of allocating instruments based on musical features, their expressive potential, and their performance characteristics differences. Moreover, the representations of symbolic music and audio music have been treated as distinct sub-areas, without a unified architecture to join their own advantages. In this work, we introduce $\textbf{UniComposer}$, a novel music generation pipeline that composes at the band level, utilizing a hierarchical multi-track music representation complemented by four cascaded diffusion models which progressively generate rhythm features, and unified features extracted from both symbolic and audio music by autoencoders. Experiments and analysis demonstrate that UniComposer achieves a unified latent space for symbolic and audio music, and is capable of generating band-level compositions with well structured multi-track arrangements, surpassing previous methods in performances.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 2187
Loading