Keywords: lyrics alignment, audio signal processing, open-source code, open-source dataset
Abstract: Recent multimodal methods for lyrics alignment have relied on large datasets. Our approach introduces a box loss that directly incorporates timestamp information into the loss function, enabling precise alignment and competitive results even with limited training data. We also address the noise present in the public DALI dataset, conducting a thorough cleaning process to improve the quality of training data. Finally, we propose JamendoLyrics++, a substantial extension of the common JamendoLyrics evaluation dataset, offering improved genre diversity for better evaluation of lyrics alignment systems.
Submission Number: 30
Loading