FocusDet: FocusConv and CLIP Guide Head for Remote Sensing Object Detection

Zilong Wang, Wei Yang, Hongxian Tian, Zishan Xu, Wei Chen, Jueting Liu, Tingting Xu, Zehua Wang

Published: 2025, Last Modified: 12 Nov 2025ICIC (9) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Detecting small, arbitrarily oriented objects in complex remote sensing images remains a significant challenge in computer vision. Conventional CNN‑based detectors struggle with fine‑grained structures of small, arbitrarily oriented objects. Moreover, existing single‑stage methods rarely exploit cross‑modal cues, leaving a semantic gap between category priors and visual features.To address these issues, we propose a novel detection framework incorporating a FocusConv module and a CLIP-guided head. The FocusConv module dynamically adjusts sampling points based on region-of-interest (RoI) classification scores to enhance feature extraction in target areas, improving small object representation. The CLIP-guided head uses text-encoded categories to align semantic information with image features through pixel-text matching, effectively guiding the detection head. Experimental results on benchmarks such as DOTA-v1.0, DOTA-v1.5 demonstrate that our method outperforms existing single-stage detectors, achieving state-of-the-art performance under single-scale conditions.

External IDs:dblp:conf/icic/WangYTXCLXW25