MsKAT: Multi-Scale Knowledge-Aware Transformer for Vehicle Re-Identification

Hongchao Li, Chenglong Li, Aihua Zheng, Jin Tang, Bin Luo

2022 (modified: 22 Nov 2022)IEEE Trans. Intell. Transp. Syst. 2022Readers: Everyone

Abstract: Existing vehicle re-identification (Re-ID) methods usually suffer from intra-instance discrepancy and inter-instance similarity. The key to solving this problem lies in filtering out identity-irrelevant interference and collecting identity-relevant vehicle details. In this paper, we aim to design a robust vehicle Re-ID framework that trains a model guided by knowledge vectors yet is able to disentangle the identity-relevant features and identity-irrelevant features. Toward this end, we propose a novel Multi-scale Knowledge-Aware Transformer (MsKAT) to build a knowledge-guided multi-scale feature alignment framework. First, we construct a Knowledge-Aware Transformer (KAT) to interact with semantic knowledge and visual feature. KAT mainly includes State elimination Transformer (SeT) to eliminate state (camera, viewpoint) interference and Attribute aggregation Transformer (AaT) to gather attribute (color, type) information. Second, to learn the knowledge-guided sample differences, we propose to encourage the separation of identity-relevant features and identity-irrelevant features by a Knowledge-Guided Alignment loss ( <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\mathcal {L}_{KGA}$ </tex-math></inline-formula> ). Specifically, <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\mathcal {L}_{KGA}$ </tex-math></inline-formula> suppresses the difference between knowledge-guided positive pairs and the similarity between knowledge-guided negative pairs. Third, with the multi-scale settings of KAT and <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\mathcal {L}_{KGA}$ </tex-math></inline-formula> , our model can capture knowledge-guided visual consistency features at different scales. Extensive evidence demonstrates our approach achieves new state-of-the-art on three widely-used vehicle re-identification benchmarks.

0 Replies