Auxiliary Tasks Benefit Skeleton-based Action Recognition

Published: 2025, Last Modified: 04 Nov 2025ICASSP 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Skeleton-based action recognition has long been a fundamental and intriguing problem in machine intelligence. This task is challenging due to pose occlusion and rapid motion, which typically results in incomplete or noisy skeleton data. State-of-the-art methods tend to learn human motion directly from these corrupted skeletons as if they were reliable. Unfortunately, this might lead to unsatisfactory results when key regions of the skeleton are occluded or disturbed. To tackle the problem, we propose a novel framework that integrates auxiliary tasks into a motion modeling network. These auxiliary tasks corrupt partial human skeletons with masking or noise and then force the network to recover the corrupted data, explicitly facilitating robust feature representation learning. We further propose supervising the auxiliary tasks with mutual information losses, mathematically ensuring feature consistency and spatial alignment between the recovered and original skeleton data. Empirically, our approach sets the new state-of-the-art performance on three benchmark datasets.
Loading