Vibe Sorcery: Integrating Emotion Recognition with Generative Music for Playlist Curation

Isabel Urrego-Gómez; Simon Colton; Iran R Roman

Vibe Sorcery: Integrating Emotion Recognition with Generative Music for Playlist Curation

Isabel Urrego-Gómez, Simon Colton, Iran R Roman

Published: 08 Sept 2025, Last Modified: 15 Sept 2025LLM4Music @ ISMIR 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Emotion Recognition, Generative Music, Playlist Curation, Text-to-Audio Synthesis, Adaptive Listening Experiences, LLM, Music Recommendation, Music information retrieval

Abstract: Vibe Sorcery generates emotionally coherent playlists using text-to-audio synthesis. The system creates dynamic musical journeys through Markov-like transitions, with each new track conditioned only on its immediate predecessor. Its three components work sequentially: the Listener extracts audio features and predicts moods and genres, the Captioner converts these to text prompts, and Stable Audio synthesizes matching tracks. Evaluations show significantly smoother emotional progression than random sampling (average Arousal Valence-space distance: 0.82 vs. 2.4). This approach demonstrates how language-prompted audio generation can create controlled, adaptive listening experiences.

Submission Number: 14

Loading