Abstract: Continuous human affect estimation from video data entails modelling the dynamic emotional state from a sequence of facial images. Though multiple affective video databases exist, they are limited in terms of data and dy-namic annotations, as assigning continuous affective labels to video data is subjective, onerous and tedious. While studies have established the existence of signature facial expressions corresponding to the basic categorical emotions, individual differences in emoting facial expressions nevertheless exist; factoring out these idiosyncrasies is critical for effective emotion inference. This work explores continuous human affect recognition using AFEW-VA, an ‘in-the-wild’ video dataset with limited data, employing subject-independent (SI) and subject-dependent (SD) settings. The SI setting involves the use of training and test sets with mutually exclusive subjects, while training and test samples corresponding to the same subject can occur in the SD setting. A novel, dynamically-weighted loss function is employed with a Convolutional Neural Network (CNN)-Long Short- Term Memory (LSTM) architecture to optimise dynamic affect prediction. Superior prediction is achieved in the SD setting, as compared to the SI counterpart.
0 Replies
Loading