HarmonyLM: Advancing Unified Large-Scale Language Modeling for Sound and Music Generation

15 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: large language model, sound generation, music generation
TL;DR: We introduce a unified perspective in modeling sound and music with discrete representations.
Abstract: The fields of sound generation and music generation have seen notable advancements with the development of specialized models tailored to each domain. However, these domains share commonalities, and the use of specialized models can lead to increased hardware resource requirements. On the other hand, recent breakthroughs in large language models, particularly in natural language processing, have showcased their ability to capture complex patterns and generate coherent and contextually relevant outputs in various tasks. Leveraging the success of these language models, we present HarmonyLM, a unified framework designed to synthesize sound and music from discrete representations. HarmonyLM adopts a unified perspective in modeling sound and music, discrete tokens are modeled from text descriptions using a decoder-only model, which are converted back to harmonious and consistent audio outputs. HarmonyLM offers significant advantages as a unified sound and music generation framework. (1) Model Scalability: the model we use in acoustic modeling a decoder-only transformer, which is free to scale up model size. (2) Data Scalability: the acoustic modeling and reconstructing audio models do not require any annotations, which accommodate different scales of data. Experimental results demonstrate the effectiveness of HarmonyLM, as it achieves superior audio quality compared to competitive baseline models. \footnote{Audio samples are available at \url{https://HarmonyLM.github.io}}
Primary Area: generative models
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 136
Loading