Keywords: contextual music recommendation, large language models, emotion detection, speech recognition, explainable AI, music retrieval, affective computing, multimodal interaction, conversational systems
TL;DR: We built a contextual music recommendation system using LLMs and emotion detection to trigger music that matches users’ feelings; it outperforms a keyword-based baseline in emotion detection, music relevance, and explanation quality.
Abstract: We present a novel system that integrates Large Language Models with real-time speech processing to enable contextual music triggering through natural conversation, addressing the fundamental gap between human emotional expression and automated music selection. Unlike traditional music retrieval systems requiring explicit queries, our approach leverages Mistral-7B-Instruct-v0.3 with 4-bit quantization to understand conversational context and emotional undertones, automatically selecting appropriate musical accompaniment without disrupting dialogue flow. The system incorporates comprehensive Explainable AI components using LIME-based techniques to provide transparent reasoning for music selections, addressing critical trust and interpretability concerns in automated music systems. Through experiments on a synthetic dataset of 30,000 emotionally-annotated songs, we demonstrate an average emotion detection confidence of 71% with successful tracking of emotional state transitions across conversational contexts. Our work establishes foundations for ambient musical intelligence with significant implications for therapeutic applications, particularly in speech therapy and dementia care, while maintaining ethical standards through the exclusive use of synthetic data to avoid copyright complications. We release the codebase and demo of Antash-system to facilitate reproducibility and future research: https://github.com/anonymous-gihub99/Antash-system.
Submission Number: 7
Loading