Keywords: Open-Set Aerial Object Detection, Automatic Label Engine, Multi-instance Open-set Aerial Dataset
Abstract: In recent years, language-guided open-world aerial object detection has gained significant attention due to its better alignment with real-world application needs. However, due to limited datasets, most existing language-guided methods primarily focus on vocabulary, which fails to meet the demands of more fine-grained open-world detection. To address this limitation, we propose constructing a large-scale language-guided open-set aerial detection dataset, encompassing three levels of language guidance: from words to phrases, and ultimately to sentences. Centered around an open-source large vision-language model and integrating image-operation-based preprocessing with BERT-based postprocessing, we present the $\textbf{OS-W2S Label Engine}$, an automatic annotation pipeline capable of handling diverse scene annotations for aerial images. Using this label engine, we expand existing aerial detection datasets with rich textual annotations and construct a novel benchmark dataset, called Multi-instance Open-set Aerial Dataset $(\textbf{MI-OAD})$, addressing the limitations of current remote sensing grounding data and enabling effective open-set aerial detection. Specifically, MI-OAD contains 163,023 images and 2 million image-caption pairs, with multiple instances per caption, approximately 40 times larger than the comparable datasets.
We also employ state-of-the-art open-set methods from the natural image domain, trained on our proposed dataset, to validate the model’s open-set detection capabilities. For instance, when trained on our dataset, Grounding DINO achieves improvements of 31.1 $AP_{50}$ and 34.7 Recall@10 for sentence inputs under zero-shot transfer conditions.
Both the dataset and the Label Engine will be made publicly available.
Croissant File: json
Dataset URL: https://kaggle.com/datasets/070cdff2f649a10895c6fa09a45a58d00982afd8a8ba573696f521edd59cc028
Code URL: https://anonymous.4open.science/r/MI-OAD
Supplementary Material: pdf
Primary Area: Datasets & Benchmarks for applications in computer vision
Submission Number: 1101
Loading