Learning Audio Features for Singer Identification and Embedding

Cheng-i Wang; George Tzanetakis

Learning Audio Features for Singer Identification and Embedding

Cheng-i Wang, George Tzanetakis

30 Jan 2018ICLR 2018 Conference Withdrawn SubmissionReaders: Everyone

Abstract: There has been an increasing use of neural networks for music information retrieval tasks. In this paper, we empirically investigate different ways of improving the performance of convolutional neural networks (CNNs) on spectral audio features. More specifically, we explore three aspects of CNN design: depth of the network, the use of residual blocks along with the use of grouped convolution, and global aggregation over time. The application context is singer classification and singing performance embedding and we believe the conclusions extend to other types of music analysis using convolutional neural networks. The results show that global time aggregation helps to improve the performance of CNNs the most. Another contribution of this paper is the release of a singing recording dataset that can be used for training and evaluation.

TL;DR: Using deep learning techniques on singing voice related tasks.

Keywords: convolution neural networks, attention, music information retrieval

0 Replies

Loading