Transformer-based visual object tracking via fine-coarse concatenated attention and cross concatenated MLP
Abstract: Highlights•The FCA learns different granularity information in one layer.•The CC-MLP, is developed to capture local interaction information across samples.•Based on the FCA and the CC-MLP, a novel encoder and decoder are designed for VOT.•The FCAT is introduced and achieve impressive performance.
Loading