YuE: Scaling Open Foundation Models for Long-Form Music Generation

YuE: Scaling Open Foundation Models for Long-Form Music Generation

ICLR 2026 Conference Submission13212 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: lyrics2song, song generation, long-form, foundation model, music generation

TL;DR: We scale up an open LM-based song generation model to match the performance of proprietary systems.

Abstract: We tackle the task of long-form music generation, particularly the challenging \textbf{lyrics-to-song} problem, by introducing \textbf{YuE (乐)}, a family of open-source music generation foundation models. Specifically, YuE scales to trillions of tokens and generates up to five minutes of music while maintaining lyrical alignment, coherent musical structure, and engaging vocal melodies with appropriate accompaniment. It achieves this through \textbf{track-decoupled next-token prediction} to overcome dense mixture signals, and \textbf{structural progressive conditioning} for long-context lyrical alignment. In addition, we redesign the \textbf{in-context learning} technique for music generation, enabling bidirectional content creation, style cloning, and improving musicality. Through extensive evaluation, we demonstrate that YuE matches or even surpasses some of the proprietary systems in musicality and vocal agility (as of 2025-01). We strongly encourage readers to \textbf{listen to our demo}\footnote{\url{https://yue-anonymous.github.io}}.

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 13212

Loading