Multimodal Emotion Recognition Using CNN-SVM with Data Augmentation

Gengyuan Guo, Pengzhi Gao, Xiangwei Zheng, Cun Ji

Published: 2022, Last Modified: 05 Feb 2025BIBM 2022EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: With the development of human-computer interaction and mobile sensors, emotion recognition based on physiological signals has aroused a lively discussion among scholars. The main difficulty faced is the small amount of data which leads to poor training results. In this paper, we proposed a multimodal emotion recognition using CNN-SVM and data augmentation (CSDAMER). Electrocardiography (ECG), galvanic skin response (GSR) and respiration (RSP) are utilized as input data, which are less requiring on the collection environment and can be collected by mobile sensors. To improve the training effect of model, data augmentation is performed by transformations, such as inversion, recombination and noise injection. Moreover, the convolutional layer of the convolutional neural network (CNN) is leveraged to extract the high-level features of the physiological signals, and then the features are input into the support vector machine (SVM) classifier to obtain the recognition results. The experimental results show that CSDAMER achieves 80.7% and 79.92% accuracy in arousal and valance, respectively. Compared with CNN alone, the accuracy of arousal and valance is increased by 12.87% and 9.95%. Meanwhile, the addition of the data augmentation improves the accuracy in arousal and valance by 21.94% and 25.73%.