Abstract: Audio tagging aims to assign tags for an audio chunk, and it has attracted increasing attention as its potential applications seem to be evident. Deep learning technologies have been successfully applied to domestic audio tagging task. However, the performance of deep models is heavily relied on the hyper-parameters selection such as the filter size in the convolutional layers. Recently, Neural Architecture Search (NAS) has been successfully applied to design deep model architectures for specified learning task. In this paper, we explore the neural architecture search method for domestic audio tagging. We propose to use the Convolutional Recurrent Neural Network (CRNN) with Attention and Location (ATT-LOC) as the audio tagging model. Then, we apply NAS to search for the optimal number of filters and the filter size. Finally, we employ a grid search over the mixup augmentation coefficient, the input size of the spectrogram and the value of batch size to further improve the classification results. As demonstrated in our experiments, the architecture found by automatic searching achieves an equal error rate of 0.095 on DCASE 2016 task 4 dataset, outperforming the CRNN baseline of 0.10. In addition, the architecture found by NAS achieves a faster convergence rate in training than the CRNN baseline.
0 Replies
Loading