PCQ: Emotion Recognition in Speech via Progressive Channel Querying

Published: 01 Jan 2024, Last Modified: 29 Sept 2024ICIC (3) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In human-computer interaction (HCI), Speech Emotion Recognition (SER) is a key technology for understanding human intentions and emotions. Traditional SER methods struggle to effectively capture the long-term temporal correlations and dynamic variations in complex emotional expressions. To overcome these limitations, we introduce the PCQ method, a pioneering approach for SER via Progressive Channel Querying. This method can drill down layer by layer in the channel dimension through the channel query technique to achieve dynamic modeling of long-term contextual information of emotions. This multi-level analysis gives the PCQ method an edge in capturing the nuances of human emotions. Experimental results show that our model improves the weighted average (WA) accuracy by 3.98% and 3.45% and the unweighted average (UA) accuracy by 5.67% and 5.83% on the IEMOCAP and EMODB emotion recognition datasets, respectively, significantly exceeding the baseline levels.
Loading