Abstract: Speech emotion recognition is a challenging task due to the ambiguity of emotion, which makes it difficult to learn the features of emotion data using machine learning algorithms. However, previous studies conventionally ignore the ambiguity of emotion and treat the emotion data as the same difficulty level, which results in low recognition accuracy. Motivated by human and animal learning studies, we propose a novel method named Progressive Co-teaching (PCT) to learn speech emotion features from simple to difficult. PCT method automatically identifies the difficulty level of data by itself using loss values, and then each network exchanges easy instances with small loss to peer network for early training. The rest instances with large loss are added gradually for later training. The experiment results demonstrate that our method achieves an improvement of 3.8% and 1.27% on MAS and IEMOCAP database than the state-of-the-arts, respectively.
0 Replies
Loading