Abstract: In this work we evaluate the zero-shot performance for emotion classification in audio recording datasets using a series of deep spectrum feature extractors which have been pre-trained on snore-analysus datasets.We evaluate here if those features are suitable for emotion-analysis on a zero-shot premise (no re-training of the CNNs). Once the feature extraction is complete, we evaluate training conventional classification models, i.e. an SVC, to classify a series of Spanish datasets which consist of male and female recordings expressing a set of emotions.
Loading