Abstract: Learning powerful representation of individual modality is critical for RGBT tracking. Recent works mainly focus on utilizing multiple convolutions to model feature representations of each modality. However, they usually leverage static convolutions to extract features, which are hard to handle complex input data. To deal with this problem, we propose a dynamic collaboration convolution, named DC-Conv, including a set of static convolutions and a weight-router module, for robust RGBT tracking. In specific, we set four static convolutions to each modality in every layer to model each modality, and design a weight-router module to fuse these static convolutions using learned dynamic weights. Such a dynamic weighting scheme makes the convolutions can be adapted to the variations of input data, and thus greatly improves the tracking performance. In addition, we propose an effective progressive learning algorithm to maximize the role of each convolution to make it capture discriminative representations. We evaluate our method on two public RGBT tracking benchmarks, and the results demonstrate the effectiveness of our tracker against state-of-the-art methods.
Loading