Real-Time Gait Recognition with Adaptive Spatial-Temporal Perception
Abstract: While YOLO series models exhibit superior computational efficiency in real-time visual tasks, extending their architectures to spatial-temporal gait recognition remains challenging. Existing video-based gait methods are often limited to local information aggregation and pairwise temporal correlation modeling, lacking the capability to capture global multi-to-multi high-order correlations across dynamic sequences, which limits recognition performance in complex scenarios. In this paper, we propose GaitYOLOv13, an accurate and lightweight end-to-end gait recognition framework. To address the above-mentioned challenges, we draw inspiration from the Hypergraph-based Adaptive Correlation Enhancement (HyperACE) mechanism and adapt it to the spatial-temporal domain. This allows the network to adaptively exploit latent high-order correlations across continuous frames and overcomes the limitation of previous methods restricted to pairwise correlation modeling based on hypergraph computation, achieving efficient global cross-frame and cross-scale feature fusion. Subsequently, inspired by the Full-Pipeline Aggregation-and-Distribution (FullPAD) paradigm, we extend it to effectively achieve fine-grained information flow and spatial-temporal representation synergy within the entire network by distributing correlation-enhanced features to the full pipeline. Finally, we propose to leverage depthwise separable convolutions to replace vanilla large-kernel convolutions in the temporal domain, and design a series of blocks that significantly reduce parameters and computational complexity without sacrificing performance. We conduct extensive experiments on the widely used CASIA-B and GREW benchmarks, and the experimental results demonstrate that our method achieves state-of-the-art performance with fewer parameters and FLOPs. Specifically, our GaitYOLOv13 improves Rank-1 accuracy by 1.2% over previous real-time baselines.
Loading