Abstract: Highlights•It is the first time that TIR object tracking is cast as a language modeling task.•A multi-level progressive fusion module is devised to enrich the representation.•Coordinate information is integrated into the cross-entropy loss.•There is no need to build an additional prediction branch to assess reliability.
Loading