Multi-scale Temporal Pose Analysis for Gait Recognition

Published: 01 Jan 2025, Last Modified: 10 Nov 2025IbPRIA 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The problem of gait recognition has primarily focused on using silhouette or visual modalities to describe the gait cycle. While these methods offer rich representations, they are heavily influenced by visual covariates like body contours or carrying objects. Pose-based methods provide greater robustness against these covariates, but current approaches have yet to effectively extract rich features from pose sequences, leading to suboptimal performance. In this work, we introduce MuSTGaitPose, a model architecture that implements multi-scale temporal analysis of pose sequences to extract richer gait features. Our model features the Multi-scale Temporal Block (MuST Block), which scans pose sequences at multiple time scales to identify key temporal patterns at each scale. We also developed Multi-scale Temporal Attention Fusion (MuSTAF) to optimally aggregate the multi-scale features based on their relative importance at each spatial and temporal location. Thus, our approach produces a combined feature that emphasizes the most relevant gait patterns across all gridded time scales. Additionally, we leverage pose heatmaps for a richer descriptor. Extensive experiments show that our approach outperforms previous pose-based methods, achieving mean Rank-1 accuracies of \(90.9\%\) on the CASIA-B and \(86.2\%\) on the SUSTech1K datasets, as well as a true acceptance rate of \(95.8\%\) at a false acceptance rate of \(1\%\) on the FVG-B dataset. Source code is available at https://github.com/Nico-Cubero/MuSTGaitPose.
Loading