Keywords: Emotion Recognition, Generative Music, Playlist Curation, Text-to-Audio Synthesis, Adaptive Listening Experiences, LLM, Music Recommendation, Music information retrieval
Abstract: Vibe Sorcery generates emotionally coherent playlists using text-to-audio synthesis. The system creates dynamic musical journeys through Markov-like transitions, with each new track conditioned only on its immediate predecessor. Its three components work sequentially: the Listener extracts audio features and predicts moods and genres, the Captioner converts these to text prompts, and Stable Audio synthesizes matching tracks. Evaluations show significantly smoother emotional progression than random sampling (average Arousal Valence-space distance: 0.82 vs. 2.4). This approach demonstrates how language-prompted audio generation can create controlled, adaptive listening experiences.
Submission Number: 14
Loading