Abstract: Detecting small-rotated objects in remote sensing remains a challenging task due to feature dilution and insufficient rotation invariance. Feature dilution arises when small object features are overwhelmed by background noise and progressively lost as network depth increases. Meanwhile, the lack of rotation invariance stems from the fixed nature of convolution, which struggles to handle arbitrary orientations. To address these challenges, we propose a rotation-invariant knowledge distillation (KD), a visual-language models (VLMs) driven KD framework tailored for optimizing small-rotated object detection in remote sensing. Our method introduces two novel components: enhanced-consistency feature distillation (ECFD) and rotation-invariant feature distillation (RIFD). ECFD mitigates feature dilution by aligning consistent language representations from VLMs with cross-depth features, ensuring consistent small-rotated object representation across different depths. RIFD enhances rotation invariance by leveraging VLMs to distill robust rotational knowledge into detectors, aligning positive and negative language features with detector features to reduce sensitivity to orientation changes and mitigate class confusion. Without introducing additional computational overhead during inference, our method significantly improves the performance of remote sensing object detectors. Extensive experiments on public remote sensing datasets with complex scenes demonstrate the state-of-the-art results. The code is available at https://github.com/Shower-Lee9527/CRKD
External IDs:doi:10.1109/tgrs.2025.3639215
Loading