Viewing from Frequency Domain: A DCT-based Information Enhancement Network for Video Person Re-IdentificationOpen Website

2021 (modified: 16 Nov 2022)ACM Multimedia 2021Readers: Everyone
Abstract: Video-based person re-identification (Re-ID) aims to match the target pedestrians under non-overlapping camera system by video tracklets. The key issue of video Re-ID focuses on exploring effective spatio-temporal features. Generally, the spatio-temporal information of a video sequence can be divided into two aspects: the discriminative information in each frame and the shared information over the whole sequence. To make full use of the rich information in video sequences, this paper proposes a Discrete Cosine Transform based Information Enhancement Network (DCT-IEN) to achieve more comprehensive spatio-temporal representation from frequency domain. Inspired by the principle that average pooling is one of the special frequency components in DCT (the lowest frequency component), DCT-IEN first adopts discrete cosine transform to convert the extracted feature maps into frequency domain, thereby retaining more information that embedded in different frequency components. With the help of DCT frequency spectrum, two branches are adopted to learn the final video representation: Frequency Selection Module (FSM) and Lowest Frequency Enhancement Module (LFEM). FSM explores the most discriminative features in each frame by aggregating different frequency components with attention mechanism. LFEM enhances the shared feature over the whole video sequence by frame feature regularization. By fusing these two kinds of features together, DCT-IEN finally achieves comprehensive video representation. We conduct extensive experiments on two widely used datasets. The experimental results verify our idea and demonstrate the effectiveness of DCT-IEN for video-based Re-ID.
0 Replies

Loading