H-ViT: Hybrid Vision Transformer for Multi-modal Vehicle Re-identificationOpen Website

Published: 01 Jan 2022, Last Modified: 15 Nov 2023CICAI (1) 2022Readers: Everyone
Abstract: Vehicle re-identification (ReID) is a critical technology in smart city and has drawn much attention. Many studies focus on single-modal (i.e., visible) vehicle re-identification, which are prone to be deteriorated under bad illumination conditions. Therefore, visible, near-infrared, and thermal-infrared multi-modal vehicle re-identification is worthy to study. This paper proposes a hybrid vision transformer (H-ViT) based multi-modal vehicle re-identification. The proposed H-ViT has two new modules: (1) modal-specific controller (MC) and (2) modal information embedding (MIE) structure. In the feature extraction process, the MC flexibly specifies modal-specific layers for different modal data and controls the sharing attribute of the position embedding to alleviate the difficulty brought by heterogeneous multi-modalities. The MIE structure learns inter- and intra-modal information to reduce feature deviations toward modal variations. Experimental results show that our H-ViT method achieves good performance on multi-modal vehicle re-identification datasets (i.e., RGBNT100 and RGBN300) by integrating MC and MIE modules, which are superior to existing algorithms.
0 Replies

Loading