Enabling Near-Zero Cost Object Detection in Remote Sensing Imagery via Progressive Self-Training

Xiang Zhang, Xiangteng Jiang, Qiyao Hu, Hangzai Luo, Sheng Zhong, Lei Tang, Jinye Peng, Jianping Fan

Published: 01 Jan 2024, Last Modified: 21 Nov 2025IEEE Trans. Geosci. Remote. Sens. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Deep learning-based object detection models rely heavily on large-scale and precise annotations for training. However, manually annotating bounding-box annotations for such data is both time-consuming and costly, especially when dealing with high-resolution satellite imagery containing densely packed small-sized objects. To alleviate the burden of manual annotation, we propose a simple yet effective approach, called progressive self-training object detection (PSTDet), to enable accurate object detection in remote sensing imagery without relying on manual annotations. Our PSTDet framework consists of two main components: initial pseudo label generation (IPLG) and progressive self-training with relabeling (PST-R). In IPLG, we leverage unsupervised image clustering, unsupervised instance detection, and geometric constraints to automatically generate high-quality bounding-box annotations for the initial training dataset. This innovative approach significantly reduces the time and expense associated with data annotation, laying a solid foundation for the subsequent progressive self-training stage. The annotations produced by IPLG serve as the training data for PST-R, which enhances the detector and pseudo labels through progressive self-training and our proposed noisy pseudo label filtering strategy (NPLFilter). Our NPLFilter purifies the quality of pseudo labels by integrating geometric constraints, prior knowledge, and category-adaptive thresholds. Experimental results demonstrate that our method achieves significant performance improvement on challenging NWPU VHR-10.v2 and DIOR datasets. Notably, our method far outperforms state-of-the-art weakly supervised methods and compares favorably with fully supervised methods.