From Classification to Creative Interpretation: A Multimodal AI Chain for Music Mood Understanding

Bilguun Jargalsaikhan; Khongorzul Munkhbat; Keun Ho Ryu

From Classification to Creative Interpretation: A Multimodal AI Chain for Music Mood Understanding

Bilguun Jargalsaikhan, Khongorzul Munkhbat, Keun Ho Ryu

Published: 08 Sept 2025, Last Modified: 10 Sept 2025LLM4Music @ ISMIR 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, Multimodal Music Analysis, Creative AI, Music Emotion Recognition, Cross-Modal Alignment

Abstract: We present a novel paradigm for music understanding that positions large language models as creative interpreters. Our system transforms music emotion recognition from categorical classification into rich, contextual storytelling through an orchestrated CNN→LLM pipeline. A specialized CNN first analyzes the acoustic signal, producing a probability distribution across four mood categories. The LLM (Gemini 2.5 Flash) then serves as the creative heart of the system, synthesizing this sparse numerical data into human-centered narratives and mood-aligned recommendations. Unlike conventional approaches that output only rigid labels, our LLM-driven interpretation captures the nuanced, multifaceted nature of musical emotion from a minimal input. Deployed as a real-time web application, the system demonstrates how this architecture can reimagine music AI interfaces, achieving a measurable increase in user engagement, including a +12.5% increase in user satisfaction in a preliminary study.

Submission Number: 16

Loading