Abstract: Highlights•A transformer tracking framework modeling spatial–temporal features is proposed.•A temporal information extractor is proposed to learn immediate appearance change.•A spatial–temporal context enhanced fusion module is proposed to integrate features.
Loading