Smoothing Model Predictions Using Adversarial Training Procedures for Speech Based Emotion Recognition
Abstract: Training discriminative classifiers involves learning a conditional distribution p(y i |x i ), given a set of feature vectors x i and the corresponding labels y i , i=1...N. For a classifier to be generalizable and not overfit to training data, the resulting conditional distribution p(y i |x i ) is desired to be smoothly varying over the inputs x i . Adversarial training procedures enforce this smoothness using manifold regularization techniques. Manifold regularization makes the model's output distribution more robust to local perturbation added to a datapoint x i . In this paper, we experiment with the application of adversarial training procedures to increase the accuracy of a deep neural network based emotion recognition system using speech cues. Specifically, we investigate two training procedures: (i) adversarial training where we determine the adversarial direction based on the given labels for the training data and, (ii) virtual adversarial training where we determine the adversarial direction based only on the output distribution of the training data. We demonstrate the efficacy of adversarial training procedures by performing a k-fold cross validation experiment on the Interactive Emotional Dyadic Motion Capture (IEMOCAP) and a cross-corpus performance analysis on three separate corpora. Results show improvement over a purely supervised approach, as well as better generalization capability to cross-corpus settings.
0 Replies
Loading