Abstract: Trustworthiness and deception recognition attracts the research community attention due to their relevant role in social negotiations and other relevant areas. Despite the increasing interest in the field, there are still many questions about how to perform automatic deception detection or which features explain better how people perceive trustworthiness. Previous studies have demonstrated that emotions and sentiments correlate with deception. However, not many articles employed deep-learning models pre-trained on emotion recognition tasks to predict trustworthiness. For this reason, this paper will compare traditional statistical functional feature sets proposed for performing emotion recognition, such as eGeMAPS, with features extracted from deep-learning models, like AlexNet, CNN-14 or xlsr-Wav2Vec2.0 pre-trained on emotion recognition tasks. After obtaining each set of features, we will train a Support Vector Machine (SVM) model on deception detection. These experiments provide a baseline to understand how methodologies exploited in emotion recognition tasks could be applied to speech trustworthiness recognition. Utilizing the eGeMAPs feature set on deception detection achieved an accuracy of 65.98% at turn level, and employing transfer-learning on the embeddings extracted from a pre-trained xlsr-Wav2Vec2.0 let improve this rate until a 68.11%, surpassing the baseline on audio modality from previous works by an 8.5%.
Loading