CFDRM: Coarse-to-Fine Dynamic Refinement Model for Weakly Supervised Moving Vehicle Detection in Satellite Videos

Jie Feng, Quanpeng Jiang, Junpeng Zhang, Yuping Liang, Ronghua Shang, Licheng Jiao

Published: 2024, Last Modified: 01 Oct 2024IEEE Trans. Geosci. Remote. Sens. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Deep learning methods have gradually developed into the mainstream methods of moving vehicle detection in satellite videos. However, these methods require labor-intensive and time-consuming box-level annotations to predict accurate locations and sizes, which is challenging for large-scale satellite video datasets with hundreds of vehicles. To address this problem, a novel coarse-to-fine dynamic refinement model (CFDRM) is proposed for moving vehicle detection in satellite videos only under the supervision of point-level annotations. CFDRM generates initial proposal boxes and performs spatio-guided matching with point annotations to obtain coarse box-level pseudo annotations. The initial priority of these coarse annotations is calculated by leveraging locally consistent prior tailored to satellite videos. Then, a dynamic refinement detector is constructed to transfer coarse annotations to fine annotations with prior and predictive collaborative curriculum refinement. During the curriculum learning (CL) process, the coarse annotations are sequentially learned with a certain priority, where the priority is inferred by considering the prior knowledge from the locally consistent prior and the knowledge itself from the predicted detector. Ultimately, a novel ambiguity-aware loss is designed to optimize the dynamic refinement detector from coarse annotations to fine annotations in an adaptively weighted fashion. Extensive experiments have been conducted on the Jilin-1 and SkySat satellite video datasets, thus demonstrating the superiority of CFDRM.