Top-K Visual Tokens Transformer: Selecting Tokens for Visible-Infrared Person Re-Identification

Bin Yang, Jun Chen, Mang Ye

Published: 2023, Last Modified: 02 Oct 2024ICASSP 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Visible modality and infrared modality person re-identification (VI-ReID) is an extremely important and challenging task. Existing works mainly focus on reducing the modality gap with Convolutional Neural Networks (CNN). However, the features extracted by CNN may contain useless identity-irrelevant information, which inevitably reduces the discrimination of features. To address this issue, this paper introduces a Top-K Visual Tokens Transformer (TVTR) framework which utilizes a top-k visual tokens selection module to accurately select top-k discriminative visual patches for reducing the distraction of identity-irrelevant information and learning discriminative features. Furthermore, a global-local circle loss is developed to optimize the TVTR for achieving cross-modality positive concentration and negative separation properties. The experimental results on SYSU-MM01 and RegDB datasets demonstrate the superiority of our method. The source code will be released.