AL-SAR: Active Learning for Skeleton-Based Action Recognition

Published: 01 Jan 2024, Last Modified: 16 May 2025IEEE Trans. Neural Networks Learn. Syst. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Action recognition from temporal multivariate sequences of features, such as identifying human actions, is typically approached by supervised training as it requires many ground truth annotations to reach high recognition accuracy. Unsupervised methods for the organization of sequences into clusters have been introduced, however, such methods continue to require annotations to associate clusters with actions. The challenges in annotation necessitate an effective classification methodology that minimizes the required number of labels. Active learning (AL) approaches have been proposed to address these challenges and were able to establish robust results on image classification. Such approaches are not directly applicable to sequences, since for sequences, the variations are in both spatial and temporal domains. In this brief, we introduce a novel method for AL for sequences, called “AL-SAR,” which combines unsupervised training with sparsely supervised annotation. In particular, AL-SAR employs a multi-head mechanism for robust uncertainty evaluation of the latent space learned by an encoder-decoder framework. It aims to iteratively select a sparse set of samples, which annotation contributes the most to the disentanglement of the latent space. We evaluate our system on common benchmark datasets with multiple sequences and actions, such as NW-UCLA, NTU RGB+D 60, and UWA3D. Our results indicate that AL-SAR coupled with encoder-decoder network outperforms other AL methods coupled with the same network structure.
Loading