Abstract: With the remarkable progress of deep learning methods, person re-identification has received a lot of attention from researchers. However, the majority of previous work mainly focus on supervised learning setting, which requires expensive data annotations. In this paper, we address this problem by proposing a purely unsupervised learning model. Inspired by the effectiveness of modeling the spatio-temporal information of pedestrian video, we mine the relationships between human body joints. Specifically, we propose a novel framework by learning inter-frame and intra-frame relationships for discriminative feature learning via two Graph Convolutional Networks (GCN) modules: spatial and temporal. The spatial module captures the structural information of the human body and the temporal module propagates information across adjacent frames. At the end, we perform hierarchical clustering by selecting P identities and K instances (PK sampling) to generate pseudo-labels for the unlabeled data. By iteratively optimizing these modules, our model extracts robust spatial-temporal information that can alleviate the occlusion problem. We conduct experiments on two benchmarks: MARS and DukeMTMC-VideoReID datasets, where we demonstrate the effectiveness of our proposed method.
0 Replies
Loading