PTSR: A Unified Patch Tokenization, Selection and Representation Framework for Efficient Micro-expression Recognition

Liangyu Fu, Junbo Wang, Qiangguo Jin, Yining Zhu, Hongsong Wang, Yuke Li, Xuecheng Wu, Kun Hu

Published: 30 Jun 2025, Last Modified: 16 Dec 2025CrossrefEveryoneRevisionsCC BY-SA 4.0
Abstract: Micro-expression recognition is a challenging task of identifying hidden emotion, as micro-expressions have brief durations and involve small-scale facial muscle movements. Although deep learning-based methods, especially transformer-based methods, have achieved impressive performance in this task, these methods exhibit high computational complexity and struggle to learn effective representations in the context of typically small-scale micro-expression datasets, due to the excess of tokens in the multi-head self-attention. Moreover, most existing methods do not differentiate the importance of local features, especially in micro-expression recognition with subtle changes. Therefore, we propose a novel unified Patch Tokenization, Selection and Representation framework (PTSR) with vision Transformer for micro-expression recognition. Specifically, PTSR first presents a dual norm shifted patch tokenization (DNSPT) module to learn spatial relations between neighboring pixels of the face region, which is implemented by elaborating spatial transformation and dual norm projection. Then, we employ a local-global attention module (LAM) to extract the local-global image feature, incorporating a dynamic token selection module (DTSM) to select important patches/tokens, thereby capturing more discriminative representations for the input clip. Extensive experiments are conducted on 4 widely used public datasets, i.e., CASME II, SAMM, SMIC, CAS(ME)3, and the experimental results indicate that our method can achieve clear performance improvements over the state-of-the-art methods, such as 8.37% improvement on the CAS(ME)3 dataset in terms of UF1 and 3.1% improvement on the SMIC dataset in terms of UAR metric.
Loading