Transcription-Based Lyrics Embeddings: Simple Extraction of Effective Lyrics Embeddings From Audio

Published: 2024, Last Modified: 16 Feb 2026ISMIR 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The majority of Western popular music contains lyrics. Previous studies have shown that lyrics are a rich source of information and are complementary to other information sources, such as audio. One factor that hinders the research and application of lyrics on a large scale is their availability. To mitigate this, we propose the use of transcriptionbased lyrics embeddings (TLE). These estimate 'groundtruth' lyrics embeddings given only audio as input. Central to this approach is the use of transcripts derived from an automatic lyrics transcription (ALT) system instead of human-transcribed, 'ground-truth' lyrics, making them substantially more accessible. We conduct an experiment to assess the effectiveness of TLEs across various music information retrieval (MIR) tasks. Our results indicate that TLEs can improve the performance of audio embeddings alone, especially when combined, closing the gap with cases where ground-truth lyrics information is available.
Loading