Abstract: Local key estimation from music audio recordings is a challenging task. Due to its complexity and inherent ambiguity, machine-learning methods often overfit to specific pieces and their annotations, therefore lacking robustness and generalizability. Based on a previous case study on the Schubert Winterreise dataset, this paper aims to build a robust local key estimation methods. To this end, we propose a novel neural network architecture (OctaveNet), which is inspired by the musical relationship of frequency bins in the constant-Q transform (CQT) and the ability of recurrent layers to process sequential data. OctaveNet rearranges the CQT spectrogram in two different ways, processes each of the branches with convolutional and recurrent layers, and finally fuses the two feature maps to predict the local key. Our results show that, while having fewer parameters, OctaveNet achieves a substantial improvement over previous methods, especially for unseen songs, which indicates its stronger generalizability.
Loading