TL;DR: We introduce Fine-Grained Guidance for AI-driven diffusion-based symbolic music generation, enabling precise, real-time, and interactive control over pitch and harmony with strong theoretical and empirical results.
Abstract: Developing generative models to create or conditionally create symbolic music presents unique challenges due to the combination of limited data availability and the need for high precision in note pitch. To address these challenges, we introduce an efficient Fine-Grained Guidance (FGG) approach within diffusion models. FGG guides the diffusion models to generate music that aligns more closely with the control and intent of expert composers, which is critical to improve the accuracy, listenability, and quality of generated music. This approach empowers diffusion models to excel in advanced applications such as improvisation, and interactive music creation. We derive theoretical characterizations for both the challenges in symbolic music generation and the effects of the FGG approach. We provide numerical experiments and subjective evaluation to demonstrate the effectiveness of our approach. We have published a demo page to showcase performances, which enables real-time interactive generation.
Lay Summary: Creating music with AI is challenging because it requires both creativity and precision. A single wrong note in a melody can ruin the entire piece — much like a typo can change the meaning of a sentence. Yet most existing AI music systems can’t reliably follow detailed instructions, especially when the music data is limited.
Our research introduces a new technique called Fine-Grained Guidance based on Diffusion Models that helps AI models generate symbolic music (like sheet music or MIDI) more accurately and in real-time. This method ensures that each note fits correctly within the intended harmony and rhythm, just like a skilled musician would do.
We tested our method with a dataset of pop music and showed that it produces better-sounding accompaniments than previous systems. Our approach also allows users — even those with limited technical skill — to guide the music generation interactively by specifying chords or melodies.
This makes AI music tools more useful for composers, hobbyists, and educators, and opens the door to more intuitive collaboration between humans and machines in creative fields.
Link To Code: https://github.com/huajianduzhuo-code/FGG-music-code
Primary Area: Deep Learning->Generative Models and Autoencoders
Keywords: Symbolic Music Generation, Guided Diffusion Models, Controllable Generation
Submission Number: 8013
Loading