Learning auditory neural representations for emotion recognition

Pablo V. A. Barros, Cornelius Weber, Stefan Wermter

Published: 2016, Last Modified: 20 May 2025IJCNN 2016EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Auditory emotion recognition has become a very important topic in recent years. However, still after the development of some architectures and frameworks, generalization is a big problem. Our model examines the capability of deep neural networks to learn specific features for different kinds of auditory emotion recognition: speech and music-based recognition. We propose the use of a cross-channel architecture to improve the generalization aspects of complex auditory recognition by the integration of previously learned knowledge of specific representation into a high-level auditory descriptor. We evaluate our models using the SAVEE dataset, the GTZAN dataset and the EmotiW corpus, and show comparable results with state-of-the-art approaches.