Utilizing Cross-Version Consistency for Domain Adaptation: A Case Study on Music Audio

Published: 19 Mar 2024, Last Modified: 06 May 2024Tiny Papers @ ICLR 2024 NotableEveryoneRevisionsBibTeXCC BY 4.0
Keywords: domain adaptation, music processing, teacher-student learning, cross-version consistency
TL;DR: We propose to use cross-version consistency for domain adaptation and present a case study on music audio transcription using the proposed strategy.
Abstract: Deep learning models are commonly trained on large annotated corpora, often in a specific domain. Generalization to another domain without annotated data is usually challenging. In this paper, we address such unsupervised domain adaptation based on the teacher--student learning paradigm. For improved efficacy in the target domain, we propose to exploit cross-version scenarios, i.e., corresponding data pairs assumed to obtain the same yet unknown labels. More specifically, our idea is to compare teacher annotations across versions and use only consistent annotations as labels to train the student model. Examples of cross-version data include the same text by different speakers (in speech recognition) or the same character by different writers (in handwritten text recognition). In our case study on music audio, versions are different recorded performances of the same composition, aligned with music synchronization techniques. Taking pitch estimation (a multi-label classification task) as an example task, we show that enforcing consistency across versions in student training helps to improve the transfer from a source domain (piano) to unseen and more complex target domains (singing/orchestra).
Submission Number: 215