Transformer vision-language tracking via proxy token guided cross-modal fusion

Published: 01 Jan 2023, Last Modified: 05 Mar 2025Pattern Recognit. Lett. 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•Proxy token guided transformer-based baseline for vision-language tracking.•Dense annotated long-term vision-language tracking dataset.•Extensive experiments on a new long-term vision-language tracking dataset.
Loading