Multi Modality Siamese Feature Fusion Transformer Tracker for Object Tracking from Hyperspectral Videos

Published: 01 Jan 2023, Last Modified: 13 Nov 2024WHISPERS 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Hyperspectral imagery captures a wide range of spectral bands, thus facilitating the extraction of distinct spectral signatures from objects. This capability significantly improves the accuracy of object tracking, as it enables differentiation based on the unique spectral characteristics of the object. Despite these benefits, hyperspectral tracking encounters challenges due to the scarcity of training data and the diversity of spectral bands present in the data. To address these challenges, we propose a multi-channel multi-RPN siamese network for object tracking in hyperspectral videos. It consists of three Siamese network branches, where the first branch takes three channel inputs, the second branch takes six channel inputs and the final branch takes twelve channel inputs. The features extracted from different Siamese networks may exhibit redundancy. Therefore, to solve the redundancy, the features generated by these Siamese networks are passed through a self-attention-based feature fusion network, which identifies and retains the most prominent and distinctive features. Subsequently, these enhanced features are passed through multiple RPN blocks. Experimental results demonstrate the efficiency and effectiveness of the proposed method on hyperspectral object tracking datasets.
Loading