Information Lossless Multi-modal Image Generation for RGB-T Tracking

Fan Li, Yufei Zha, Lichao Zhang, Peng Zhang, Lang Chen

Published: 01 Jan 2022, Last Modified: 10 Nov 2023PRCV (4) 2022Readers: Everyone

Abstract: Visible-Thermal infrared(RGB-T) multimodal target representation is a key issue affecting RGB-T tracking performance. It is difficult to train a RGB-T fusion tracker in an end-to-end way, due to the lack of annotated RGB-T image pairs as training data. To relieve above problems, we propose an information lossless RGB-T image pair generation method. We generate the TIR data from the massive RGB labeling data, and these aligned RGB-T data pair with labels are used for RGB-T fusion target tracking. Different from the traditional image modal conversion model, this paper uses a reversible neural network to realize the conversion of RGB modal to TIR modal images. The advantage of this method is that it can generate information lossless TIR modal data. Specifically, we design reversible modules and reversible operations for the RGB-T modal conversion task by exploiting the properties of reversible network structure. Then, it does not lose information and train on a large amount of aligned RGB-T data. Finally, the trained model is added to the RGB-T fusion tracking framework to generate paired RGB-T images end-to-end. We conduct adequate experiments on the VOT-RGBT2020 [14] and RGBT234 [16] datasets, the experimental results show that our method can obtain better RGB-T fusion features to represent the target. The performance on the VOT-RGBT2020 [14] and RGBT234 [16] datasets is $$ 4.6\%$$ and $$ 4.9\%$$ better than the baseline in EAO and Precision rate, respectively.

0 Replies