Abstract: Satellite videos capture the dynamic changes in a large observed sense, which provides an opportunity to track the object trajectories. However, existing multiple object tracking (MOT) methods require massive video annotations, which is time-consuming and fallible. To alleviate this problem, this article proposes a cross-domain multiple object tracker (CDTrack) to learn knowledge from multiple source domains. First, a cross-domain object detector with multilevel domain alignment is constructed to learn domain-invariant knowledge between remote sensing images and satellite videos. Second, the proposed method adopts a bidirectional teacher–student framework to fuse multiple source domains. Two teacher–student models learn different domain knowledge and teach mutually each other. With mutual learning, the proposed method alleviates the discrepancies between different domains. Finally, a simple weakly supervised Re-IDentification (Re-ID) model is proposed for long-term association. Experimental results on the satellite video datasets demonstrate that the proposed method can achieve great performance without satellite video annotations. The code is available at https://github.com/XiangtaoZheng/CDTrack.
Loading