Abstract: Gait recognition is a promising long-distance biological recognition technology that is widely used in public security and video surveillance. In the most recent literature, researchers use fixed horizontal partitions on input images or intermediate features for feature extraction, which are neither adaptive to the variations in camera viewpoint and individual appearances nor sufficient for spatio-temporal fine-grained feature capturing. Besides, the existing methods either ignore temporal modeling or model insufficient/less-discriminative temporal information for motion capturing. Based on the findings above, it is considered that gait recognition requires both fine-grained feature expression and higher spatio-temporal modeling capabilities. Therefore, this study proposes a spatio-temporal augmented relation network (STAR), which adaptively generates multiple salient features in non-overlapped diverse regions for fine-grained feature mining and extracts spatio-temporal augmented features with rich temporal scales based on intra- and inter-relation joint learning among diverse regions. Extensive experiments are conducted on the CASIA-B and OU-MVLP datasets to demonstrate the effectiveness of the proposed method. On the CASIA-B dataset, rank-1 accuracy of 97.4%, 95.6%, and 87.1% is achieved under normal-walking, bag-carrying, and coat-wearing conditions, respectively, and 89.7% on the OU-MVLP dataset, which are the new state-of-the-art results.
0 Replies
Loading