Abstract: Video-based person re-identification aims to match the same identification from video clips captured by multiple non-overlapping cameras. By effectively exploiting both temporal and spatial clues of a video clip, a more comprehensive representation of the identity in the video clip can be obtained. In this manuscript, we propose a novel graph-based framework, referred as Temporal Extension Adaptive Graph Convolution (TE-AGC) which could effectively mine features in spatial and temporal dimensions in one graph convolution operation. Specifically, TE-AGC adopts a CNN backbone and a key-point detector to extract global and local features as graph nodes. Moreover, a delicate adaptive graph convolution module is designed, which encourages meaningful information transfer by dynamically learning the reliability of local features from multiple frames. Comprehensive experiments on two video person re-identification benchmark datasets have demonstrated the effectiveness and state-of-the-art performance of the proposed method.
Loading