Abstract: Although modern remote sensing object detection (RSOD) methods have achieved advanced performance, they heavily rely on a large amount of annotated data. This article explores semi-supervised RSOD to mitigate annotation costs, leveraging recent extensive research in generic semi-supervised object detection (SSOD) based on the self-training paradigm. Current SSOD methods encounter challenges in adapting to remote sensing images (RSIs) due to the complexity and variability of RSIs. Two key issues remain underexplored: the noise in pseudo-labels caused by model instability and the difficulty in distinguishing similar categories. This article introduces the temporal-feedback self-training (TST) framework, a novel approach to tackle these challenges in semi-supervised RSOD. TST consists of two components: temporal consistency-based pseudo-labels certainty estimation (TCE) and temporal self-feedback feature refinement (TSF). TCE addresses pseudo-label noise during training by evaluating the stability of pseudo-label classification and localization over time series to assess the quality of pseudo-labels. On the other hand, TSF enhances pseudo-label quality by dynamically identifying the model’s confusing categories as feedback for feature refinement. Both components facilitate the progression of the self-training-based RSOD during training. We conducted extensive experiments on two challenging public datasets: DOTA and DIOR. The results demonstrate that the proposed TST and TCE components significantly improve the baseline model’s performance, surpassing the state-of-the-art generic SSOD method. This suggests that our approach is more effective than generic SSOD methods in addressing the challenges posed by RSIs.
Loading