Abstract: For person reidentification (re-id), most current research aims to encode the spatial and temporal information by
using convolutional neural networks (CNNs) to extract spatial
features and recurrent neural networks (RNNs) or their variations to discover the time dependencies. However, it ignores the
effect of the complex background, which leads to a biased spatial
representation. Furthermore, it often uses the backpropagation
through time (BPTT) to train RNNs. Unfortunately, it is hard to
learn the long-term dependency via BPTT due to the gradient
vanishing or exploding. The significance of a frame should not
be biased by its position in a given sequence. In this article, a
new method is proposed to learn an unbiased semantic representation for video-based person re-id. To handle the background
clutter and occlusion, a two-branch CNN model is used to obtain
the enriched representation from both the foreground person and
original pedestrian images. Then, an unbiased bidirectional CNN
architecture is developed to learn the unbiased spatial and temporal representation. The experimental results on three public
data sets demonstrate the effectiveness of the proposed method.
Loading