Transformer-based visual object tracking via fine-coarse concatenated attention and cross concatenated MLP

Published: 01 Jan 2024, Last Modified: 13 Nov 2024Pattern Recognit. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•The FCA learns different granularity information in one layer.•The CC-MLP, is developed to capture local interaction information across samples.•Based on the FCA and the CC-MLP, a novel encoder and decoder are designed for VOT.•The FCAT is introduced and achieve impressive performance.
Loading