ALCAP: Alignment-Augmented Music CaptionerDownload PDF

Anonymous

17 Feb 2023 (modified: 05 May 2023)ACL ARR 2023 February Blind SubmissionReaders: Everyone
Abstract: Growing popularity of streaming media platforms for music search and recommendations has led to a need for novel methods for interpreting music that take into account both lyrics and audio. However, many previous works focus on refining individual components of encoder-decoder architecture that maps music to caption tokens, ignoring the potential benefits of correspondence between audio and lyrics. In this paper, we propose to explicitly learn the multimodal alignment through contrastive learning. By learning audio-lyrics correspondence, the model is guided to learn better cross-modal consistency, thus generating high-quality captions. We provide both theoretical and empirical results demonstrating the advantage of the proposed method, and achieve new state-of-the-art on two music captioning datasets.
Paper Type: long
Research Area: NLP Applications
0 Replies

Loading