Abstract: In recent times, the estimation of affective states from physiological data has garnered considerable attention within the research community owing to its wide-ranging applicability in daily life scenarios. The advancement of wearable technology has facilitated the collection of physiological signals, thereby highlighting the necessity for a resilient system capable of effectively discerning and interpreting user states. This work introduces an innovative methodology aimed at addressing the Valence-Arousal estimation, through the utilization of physiological signals. Our proposed model presents an efficient multi-scale transformer-based architecture for fusing signals from multiple modern sensors to tackle Emotion Recognition task. Our approach involves applying a multi-modal technique combined with scaling data to establish the relationship between internal body signals and human emotions. Additionally, we utilize Transformer and Gaussian transformation techniques to improve signal encoding effectiveness and overall performance. Our proposed model demonstrates compelling performance on the CASE dataset, achieving an impressive Root Mean Squared Error (RMSE) of 1.45.
Loading