Improving Person Re-identification with Semantically Aligned Appearance TransformerDownload PDFOpen Website

2022 (modified: 02 Nov 2022)IJCNN 2022Readers: Everyone
Abstract: Person re-identification (re-id) has achieved significant improvement under the convolution neural network (CNN)-based methods. But it suffers from information loss on details caused by convolution and downsampling operators. Recently, there has been a growing interest in conducting image classification tasks by applying transformer-based methods to overcome these limitations. However, it is still limited by the body misalignment problem caused by pose/viewpoint variations, imperfect person detection, occlusion, etc. To address these problems, we leverage the estimation of the dense semantics of a person image to construct a set of densely human appearance semantically aligned images (DHASA-images), where the same spatial positions have the same semantics across different images. In this paper, we take both original images and DHASA-images into consideration to obtain the discriminative representation of pedestrians. We propose a novel approach named appearance semantically aligned guiding network (ASAG-Net) for person reid based on the transformer. The network is composed of two subnetworks: one for feature extraction from two kinds of images and another for feature fusion and obtain the final feature final feature generation. To the best of our knowledge, we are the first to make use of densely human appearance semantically aligned to strengthening the transformer-based person re-id method. We demonstrate the proposed method through extensive experiments and achieve superior results on three large-scale person re-id datasets, Market1501, DukeMTMC-reID, and MSMT17.
0 Replies

Loading