Multi-Information Hierarchical Fusion Transformer with Local Alignment and Global Correlation for Micro-Expression Recognition
Abstract: Learning discriminative micro-expression features from low-intensity facial movements is a key challenge for micro-expression recognition. Although existing research has demonstrated that the appearance, motion, and geometric information are distinguishing for micro-expressions, the effectiveness of jointing this information is still unclear. Thus, this paper proposes a Multi-information Hierarchical Fusion Transformer (MiHF-Tr) model to fully and effectively aggregate the facial appearance, motion, and geometric information of micro-expressions, exploring a more reasonable way of multi-information fusion. As different information is homology, MiHF-Tr introduces a local and global hierarchy fusion framework to fuse them by modeling their local and global semantic consistency. Considering the bias of different information in feature representation ability, a single-core self-attention is proposed to achieve local multi-information fusion, which focuses on strong information and supplements it with weak information. The experimental results demonstrate that the fusion of appearance, motion, and geometric features is discriminative, and the proposed method can effectively aggregate multiple information, achieving competitive performance.
Loading