HKDSME: Heterogeneous Knowledge Distillation for Semi-supervised Singing Melody Extraction Using Harmonic Supervision

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 OralEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Singing melody extraction is a key task in the field of music information retrieval (MIR). However, decades of research works have uncovered two difficult issues. \emph{First}, binary classification on frequency-domain audio features (e.g., spectrogram) is regarded as the primary method, which ignores the potential associations of musical information at different frequency bins, as well as their varying significance for output decisions. \emph{Second}, the existing semi-supervised singing melody extraction models ignore the accuracy of the generated pseudo labels by semi-supervised models, which largely limits the further improvements of the model. To solve the two issues, in this paper, we propose a \underline{h}eterogeneous \underline{k}nowledge \underline{d}istillation framework for \underline{s}emi-supervised singing \underline{m}elody \underline{e}xtraction using harmonic supervision, termed as \emph{HKDSME}. We begin by proposing a four-class classification paradigm for determining the results of singing melody extraction using harmonic supervision. This enables the model to capture more information regarding melodic relations in spectrograms. To improve the accuracy issue of pseudo labels, we then build a semi-supervised method by leveraging the extracted harmonics as a consistent regularization. Different from previous methods, it judges the availability of unlabeled data in terms of the inner positional relations of extracted harmonics. To further build a light-weight semi-supervised model, we propose a heterogeneous knowledge distillation (HKD) module, which enables the prior knowledge transfers between heterogeneous models. We also propose a novel confidence guided loss, which incorporates with the proposed HKD module to reduce the wrong pseudo labels. We evaluate our proposed method using several well-known public available datasets, and the findings demonstrate the efficacy of our proposed method.
Primary Subject Area: [Experience] Multimedia Applications
Relevance To Conference: This work contributes to the field of music information retrieval, which is an active research field in multimedia processing. This work proposes a novel heterogeneous knowledge distillation framework for semi-supervised singing melody extraction using harmonic supervision. This work propose a novel harmonic supervision method, which is inspired by the instrinsic characteristics of polyphonic music, which will certainly promote other music infromation retrieval applications. In addition, this work propose a novel heterogeneous knowledge distillation method for semi-supervised learning, which will also promote other multimedia applications.
Supplementary Material: zip
Submission Number: 3244
Loading