Every Subtlety Counts: Fine-grained Person Independence Micro-Action Recognition via Distributionally Robust Optimization

02 Sept 2025 (modified: 12 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Micro-Action Recognition, Action Recognition, Time-frequency representation analysis
Abstract: Micro-action Recognition (MAR) is vital for psychological assessment and human-computer interaction. However, existing methods often fail in real-world scenarios due to inter-person variability, e.g., differences in motion styles, execution speed, and physiques, cause the same action to manifest differently, hindering robust generalization. To overcome this, we propose the Person Independence Universal Micro-action Recognition Framework (PIUmr), which embeds Distributionally Robust Optimization (DRO) principles to learn person-agnostic representations. PIUmr achieves this through two synergistic, plug-and-play components that operate at the feature and loss levels, respectively. First, at the feature level, the Temporal–Frequency Alignment Module (TFAM) normalizes person-specific motion characteristics. It employs a dual-branch architecture to disentangle motion patterns. The temporal branch uses Wasserstein-regularized alignment to create a stable dynamic trajectory, mitigating variations caused by different motion styles and speeds. The frequency branch uses variance-guided perturbations to build robustness against person-specific spectral signatures arising from different physical attributes (e.g., skeleton size). A consistency-driven mechanism then adaptively fuses these branches. Second, at the loss level, the Group-Invariant Regularized Loss (GIRL) is applied to the aligned features to guide robust learning. It simulates challenging, unseen person-specific distributions by partitioning samples into pseudo-groups. By up-weighting hard boundary cases and regularizing subgroup variance, it forces the model to generalize beyond easy or frequent samples, thus enhancing its robustness against the most difficult person-specific variations. Extensive experiments on the large-scale MA-52 dataset demonstrate that PIUmr significantly outperforms existing methods in both accuracy and robustness, achieving stable generalization under fine-grained conditions.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 739
Loading