Keywords: audio, classification, convolutional neural network, deep learning, filter, filter-bank, raw waveform
TL;DR: A new convolution layer where the kernels are based on audio signal processing filters with few learnable parameters.
Abstract: We propose and investigate the design of a new convolutional layer where kernels are parameterized functions. This layer aims at being the input layer of convolutional neural networks for audio applications. The kernels are defined as functions having a band-pass filter shape, with a limited number of trainable parameters. We show that networks having such an input layer can achieve state-of-the-art accuracy on several audio classification tasks. This approach, while reducing the number of weights to be trained along with network training time, enables larger kernel sizes, an advantage for audio applications. Furthermore, the learned filters bring additional interpretability and a better understanding of the data properties exploited by the network.
Code: https://app.box.com/s/vh5u7mpwrllhuqr8yl9jobohjrrjw797
Original Pdf: pdf
12 Replies
Loading