Towards Conditional Adversarial Training for Predicting Emotions from Speech

Published: 01 Jan 2018, Last Modified: 08 Apr 2025ICASSP 2018EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Motivated by the encouraging results recently obtained by generative adversarial networks in various image processing tasks, we propose a conditional adversarial training framework to predict dimensional representations of emotion, i. e., arousal and valence, from speech signals. The framework consists of two networks, trained in an adversarial manner: The first network tries to predict emotion from acoustic features, while the second network aims at distinguishing between the predictions provided by the first network and the emotion labels from the database using the acoustic features as conditional information. We evaluate the performance of the proposed conditional adversarial training framework on the widely used emotion database RECOLA. Experimental results show that the proposed training strategy outperforms the conventional training method, and is comparable with, or even superior to other recently reported approaches, including deep and end-to-end learning.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview