Spatio-temporal convolutional features with nested LSTM for facial expression recognition

Zhenbo Yu, Guangcan Liu, Qingshan Liu, Jiankang Deng

2018 (modified: 10 Apr 2022)Neurocomputing 2018Readers: Everyone

Abstract: In this paper, we propose a novel end-to-end architecture termed Spatio-Temporal Convolutional features with Nested LSTM (STC-NLSTM), which learns the muti-level appearance features and temporal dynamics of facial expressions in a joint fashion. More precisely, 3DCNN is used to extract spatio-temporal convolutional features from the image sequences that represent facial expressions, and the dynamics of expressions are modeled by Nested LSTM, which is actually coupled by two sub-LSTMs, saying T-LSTM and C-LSTM. Namely, T-LSTM is used to model the temporal dynamics of the spatio-temporal features in each convolutional layer, and C-LSTM is adopted to integrate the outputs of all T-LSTMs together so as to encode the multi-level features encoded in the intermediate layers of the network. We conduct experiments on four benchmark databases, CK+, Oulu-CASIA, MMI and BP4D, and the results show that the proposed method achieves a performance superior to the state-of-the-art methods.

0 Replies