Accurately predicting the popularity of a music is a critical challenge in the music industry given the potential benefits to artists, producers and streaming platforms. Historically, research on music success was focused on factors such as audio features and extrinsic metadata (e.g., artist demographics, listener trends), or advancing prediction model architecture. This paper addresses the under-explored area of exploiting lyrical content to predict music popularity. We present a novel automated pipeline that uses LLMs to extract mathematical representations from lyrics, capturing their semantic and syntactic structure, while preserving sequential information. These features are then integrated into a novel multimodal architecture, HitMusicLyricNet, combining audio, lyrics, and social metadata for predicting popularity score. Our method outperforms the available baseline in end-to-end deep learning architecture for music popularity prediction on the SpotGenTrack (SPD) dataset. We achieve an overall 9% and 20% improvement in prediction model performance metrics MAE and MSE respectively. We confirm that the improvements result from the introduction of our lyrics feature engineering pipeline (LyricsAENet) in our model architecture, HitMusicLyricNet.
Abstract:
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: Multimodal applications
Languages Studied: English
Submission Number: 1683
Loading