Abstract: Automatic facial expression recognition plays a crucial role in computer vision, and pattern recognition. However, most existing deep learning-based facial expression classifiers usually obtain high average accuracy but have poor recognition accuracy for difficult expressions, like fear and disgust. In this paper, we propose a novel end-to-end architecture termed two-stream inter-class variation enhancement network, which learns the high-level semantic features and subtle inter-class variations in a joint fashion. More precisely, the global feature extraction network is used to extract spatial-channel semantic features, and the variations between different expressions are modeled by a distinction-reinforced network. The outputs of these two streams are weighted integrated in the expression classification network. In addition, a class balanced-weighted cross-entropy loss is designed to further improve feature discrimination. Experiment results indicate that the proposed network can significantly improve the recognition of difficult expressions and achieve a satisfactory average recognition accuracy of 73.67% on FER2013, 86.17% on RAFDB, 98.19% on CK+, and 98.85% on Oulu-CASIA, which outperforms the other state-of-the-art methods.
Loading