Learning Holistic-Componential Prompt Groups for Micro-Expression Recognition

ICLR 2026 Conference Submission19507 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Micro-expression Recognition, Facial Action Units, Visual-Language Models, Holistic-Componential Prompt Groups
Abstract: Micro-expressions(MEs) are facial muscle movements that reveal genuine underlying emotions. Due to their subtlety and visual similarity, micro-expression recognition (MER) presents significant challenges. Existing methods mainly rely on low-level visual features and lack an understanding of high-level semantics, making it difficult to differentiate fine-grained emotional categories effectively. Facial action units (AUs) provide local action region encodings, which help establish associations between emotional semantics and action semantics. However, the complex cross-mapping relationship between emotional categories and AUs easily leads to semantic confusion. To address these problems, we propose a novel framework for MER, called HCP$\_$MER, which leverages the powerful alignment capabilities of visual-language models such as CLIP to construct multimodal visual-language alignments through holistic-componential prompt groups. We provide corresponding holistic emotion and componential AU prompts for each emotion category to eliminate semantic ambiguity. By aligning optical flow and motion magnification representations with componential and holistic prompts, respectively, our approach establishes multi-granularity complementary visual-semantic associations. To ensure the precise attribution of predicted emotional semantics, we design a consistency constraint to enhance decision stability. Finally, we integrate adaptive gated fusion of complementary responses with downstream supervisory signal optimization to achieve fine-grained emotion discrimination. Experimental results on CASME II, SAMM, SMIC, and CAS(ME)$^3$ demonstrate that HCP$\_$MER achieves competitive performance, exhibiting remarkable robustness and discriminability.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 19507
Loading