Unbiased Spatiotemporal Representation With Uncertainty Control for Person Reidentification

Xiu Zhang

Published: 15 Mar 2022, Last Modified: 04 Mar 2025IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMSEveryoneCC BY 4.0

Abstract: For person reidentification (re-id), most current research aims to encode the spatial and temporal information by using convolutional neural networks (CNNs) to extract spatial features and recurrent neural networks (RNNs) or their variations to discover the time dependencies. However, it ignores the effect of the complex background, which leads to a biased spatial representation. Furthermore, it often uses the backpropagation through time (BPTT) to train RNNs. Unfortunately, it is hard to learn the long-term dependency via BPTT due to the gradient vanishing or exploding. The significance of a frame should not be biased by its position in a given sequence. In this article, a new method is proposed to learn an unbiased semantic representation for video-based person re-id. To handle the background clutter and occlusion, a two-branch CNN model is used to obtain the enriched representation from both the foreground person and original pedestrian images. Then, an unbiased bidirectional CNN architecture is developed to learn the unbiased spatial and temporal representation. The experimental results on three public data sets demonstrate the effectiveness of the proposed method.