ByteHum: Fast and Accurate Query-by-Humming in the Wild

Published: 01 Jan 2024, Last Modified: 14 Jul 2025ICASSP 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Query by Humming (QBH) is a practically meaningful task, while most existing methods struggle to scale to real-life applications due to the complex preprocessing for building the database and the limited search speed. In this paper, we propose the ByteHum system, a fast and efficient humming retrieval system which is capable of searching against large-scale databases built on raw song audios without the need for extensive preprocessing. ByteHum employs a convolutional neural network to extract features from raw audio, and utilizes a source-separated cover song identification dataset for weakly supervised training of the feature extractor. We explore the use of unsupervised domain adaptation techniques to enhance the performance of our weakly supervised model on the QBH task. Furthermore, to evaluate QBH systems’ performance on non-manually processed databases in the wild, we annotate original recordings for three existing QBH benchmark sets. Our experimental results demonstrate that ByteHum significantly outperforms existing QBH systems in terms of speed and accuracy under both classical and unconstrained settings.
Loading