Fusing transcription results from polyphonic and monophonic audio for singing melody transcription in polyphonic music

Published: 01 Jan 2017, Last Modified: 31 Jul 2025ICASSP 2017EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: This paper presents a new system for singing melody transcription from polyphonic songs. Instead of operating solely on polyphonic audio of each song to be processed (as most existing systems do), our system takes as inputs additionally multiple monophonic recordings of people singing the song. To transcribe the singing melody in a song, our system first tracks the singing pitch from polyphonic audio of the song by using a deep neural network (DNN)-based method, and then uses the estimated pitch series as reference to select the pitch sequences extracted from the multiple monophonic singing recordings. The selected monophonic pitch sequences, as well as the DNN pitch series from the polyphonic audio, are then transcribed separately, and their transcriptions results are fused to form the final note sequence. Experimental results show that, by introducing monophonic singings into transcription, the performance of singing melody transcription from polyphonic songs can be significantly improved.
Loading