Abstract: Multi-object tracking (MOT) in satellite videos is an essential topic with many applications, such as traffic monitoring and disaster response. However, many multi-object trackers that perform well in natural scenes show weak generalization in satellite videos due to low object discrimination caused by low spatial resolution and the widespread indistinguishable background, such as clouds and reflections. In this article, we design a novel MOT framework called cross-frame tracker (CFTracker) for satellite videos from the point of both the network structure and training method. On one hand, in network structure design, a cross-frame feature update (CFU) module is proposed to enhance object recognition and reduce the response to background noises using rich temporal semantic information. On the other hand, we reveal that the picture-pair training approach used by the mainstream MOT network is not entirely conducive to the network learning temporal semantic information. To better grasp the cross-frame feature connections and output time-consistent motion predictions, we train CFTracker by a novel cross-frame training flow (CT). Experiments demonstrate the effectiveness of our CFTracker and obtain state-of-the-art tracking accuracy and precision of 72.9% score on the AIR-MOT dataset and 57.1% score on the VISO dataset. The code will be available online.
0 Replies
Loading