['3c3', '< Abstract: Synthetic Aperture Radar (SAR) object detection has gained significant attention recently due to its irreplaceable all-weather imaging capabilities. However, this research field suffers from both limited public datasets (mostly comprising <2K images with only mono-category objects) and inaccessible source code. To tackle these challenges, we establish a new benchmark dataset and an open-source method for large-scale SAR object detection. Our dataset, SARDet-100K, is a result of intense surveying, collecting, and standardizing 10 existing SAR detection datasets, providing a large-scale and diverse dataset for research purposes. To the best of our knowledge, SARDet-100K is the first COCO-level large-scale multi-class SAR object detection dataset ever created. With this high-quality dataset, we conducted comprehensive experiments and uncovered a crucial challenge in SAR object detection: the substantial disparities between the pretraining on RGB datasets and finetuning on SAR datasets in terms of both data domain and model structure. To bridge these gaps, we propose a novel Multi-Stage with Filter Augmentation (MSFA) pretraining framework that tackles the problems from the perspective of data input, domain transition, and model migration. The proposed MSFA method significantly enhances the performance of SAR object detection models while demonstrating exellent generalizability and flexibility across diverse models. This work aims to pave the way for further advancements in SAR object detection. The dataset and code is available at https://github.com/zcablii/SARDet_100K.', '---', '> Abstract: Synthetic Aperture Radar (SAR) object detection is crucial for all-weather imaging but is severely hampered by two key limitations: the scarcity of large-scale, diverse, and publicly accessible datasets, and significant domain and model gaps when adapting models pretrained on optical imagery to SAR data. To address these challenges, we introduce SARDet-100K, the first COCO-level large-scale multi-class SAR object detection dataset, meticulously compiled from 10 existing datasets. This high-quality benchmark enables robust research and evaluation. Furthermore, we propose the Multi-Stage with Filter Augmentation (MSFA) pretraining framework to effectively bridge the identified domain and model gaps. MSFA tackles these disparities from the perspectives of data input, domain transition, and model migration, significantly enhancing the performance, generalizability, and flexibility of SAR object detection models across diverse architectures. This work establishes a new open-source benchmark and toolkit, paving the way for substantial advancements in SAR object detection. The dataset and code are available at https://github.com/zcablii/SARDet_100K.', '8,15c8,20', '< Limited resources. A significant obstacle in high-resolution SAR image object detection is the sensitivity of SAR images, coupled with the high costs associated with annotating these images. This severely restricts the availability of public datasets. Existing datasets, such as SAR-AIRcraft [90], Air-SARShip [76], SSDD [84], and HRSID [71], typically consist of a singular type of object against a simplistic background. Moreover, these datasets are generally limited in scale, potentially introducing bias when evaluating different methodologies. Additionally, a notable barrier to advancing research in SAR object detection is the lack of publicly accessible source code, making it challenging to reproduce previous research findings and conduct fair comparisons or build upon existing work.', '< To address this problem, we merge the most publicly available SAR detection datasets. This effort includes a comprehensive review of current public SAR detection resources, followed by the collection and standardization of these datasets into a uniform format, creating a unified large-scale multi-class dataset for SAR object detection, named SARDet-100k. This dataset comprises approximately 117k images and 246k instances of objects across six distinct categories. To our knowledge, SARDet-100k is the first dataset of COCO-scale magnitude in this research area. It significantly contributes to overcoming the previously mentioned limitations by providing a rich resource for the development and evaluation of SAR object detection models. Moreover, the dataset and source code will be made publicly available.', '< Transferring gaps. Through our empirical research and detailed analysis, we have identified that a principal hurdle in SAR object detection is the significant domain gap and model gap encountered when transferring a backbone network pretrained on natural RGB datasets (e.g., ImageNet [17]), to a detection network on SAR imagery. The domain gap stems from the stark visual discrepancies between RGB and SAR imagery, whereas the model gap arises from the model differences between the pretrained backbone and the whole detection framework employed in the downstream task.', '< To mitigate the aforementioned domain gap and model gap, we propose a novel Multi-Stage with Filter Augmentation (MSFA) pretraining framework to bridge these gaps. This framework addresses the challenge from multiple angles: data input, domain transition, and model migration, each tailored to the unique properties of the SAR image detection task. For data input: to address the input domain gap between the pretrain and finetune datasets, we employ traditional, handcrafted feature descriptors. These descriptors efficiently transform the input data from pixel space to a feature space that is not only robust to noise but also statistically narrows the gap between data from RGB and SAR modalities (see Fig. 2(a)), thereby enhancing the transferability of pretrained knowledge. For domain transition: we propose a domain transition bridge utilizing an optical remote sensing detection dataset. This bridge connects natural RGB images through optics correlation and SAR images through object correlation, establishing a hierarchical pretraining approach that effectively closes the domain gap between RGB and SAR imagery (see Fig. 2(b)). For model migration: to guarantee thorough training of the entire detection framework and to facilitate complete model migration for finetuning, we employ the entire detector as a bridging model throughout the multi-stage pretraining process.', '< The MSFA framework demonstrates remarkable efficacy in reducing the substantial domain and model gaps typically encountered between the pretraining and finetuning stages. MSFA is not only effective but also general and applicable across various modern deep neural networks.', '< Our contribution to the field of SAR object detection can be concluded into the following FOUR points:', '< • Introduction of the first COCO-level large-scale dataset for SAR multi-category object detection.', '< • Identification of critical gaps in traditional model pretrain and finetune approaches for SAR object detection. • Proposal of a Multi-Stage with Filter Augmentation (MSFA) pretraining framework, which demonstrates remarkable effectiveness, as well as excellent generalizability and flexibility across various deep network models. • Establishment of a new benchmark in SAR object detection by releasing the datasets and code associated with our research. This contribution is expected to foster further advancements and progress in the field.', '---', '> Limited resources. High-resolution SAR image object detection faces substantial hurdles due to the inherent sensitivity of SAR imagery and the prohibitive costs of annotation. This severely curtails the availability of public datasets, which are typically small-scale (<2K images), often mono-category, and lack diverse backgrounds (e.g., SAR-AIRcraft [90], Air-SARShip [76], SSDD [84], HRSID [71]). Such limitations can introduce bias and hinder robust evaluation of methodologies. Furthermore, the scarcity of publicly accessible source code impedes reproducibility, fair comparisons, and collaborative advancement in the field.', '> To address these critical resource limitations, we meticulously surveyed, collected, and standardized 10 existing public SAR detection datasets into a unified, large-scale, multi-class benchmark: SARDet-100K. Comprising approximately 117k images and 246k instances across six distinct categories, SARDet-100K is, to our knowledge, the first dataset of COCO-scale magnitude in SAR object detection. This significantly enriches the research landscape, providing an unparalleled resource for developing and evaluating advanced SAR object detection models. Crucially, both the dataset and its associated source code will be made publicly available to foster transparency and accelerate research.', '> Transferring gaps. Our empirical research reveals a critical challenge in SAR object detection: substantial domain and model gaps when transferring backbone networks pretrained on natural RGB datasets (e.g., ImageNet [17]) to SAR imagery detection tasks. The domain gap arises from the stark visual discrepancies between RGB and SAR modalities, while the model gap stems from structural differences between a pretrained backbone and the complete detection framework required for downstream tasks.', '> To effectively bridge these transferring gaps, we introduce the novel Multi-Stage with Filter Augmentation (MSFA) pretraining framework. MSFA comprehensively addresses these challenges through three key perspectives: data input, domain transition, and model migration.', '> *   **Data Input:** To mitigate the input domain gap, we employ traditional, handcrafted feature descriptors. These descriptors transform raw pixel data into a noise-robust feature space, statistically narrowing the gap between RGB and SAR modalities (see Fig. 2(a)) and significantly enhancing knowledge transfer.', '> *   **Domain Transition:** We propose a hierarchical pretraining approach using an optical remote sensing detection dataset as a domain transition bridge. This bridge connects natural RGB images (via optical correlation) and SAR images (via object correlation), effectively closing the domain gap between these modalities (see Fig. 2(b)).', '> *   **Model Migration:** To ensure comprehensive training and full model migration for finetuning, the entire detector framework is utilized as a bridging model throughout the multi-stage pretraining process.', '> The MSFA framework demonstrates remarkable efficacy in reducing both domain and model gaps, leading to significant performance enhancements. Importantly, MSFA is highly generalizable and flexible, applicable across a wide range of modern deep neural networks.', '> Our contributions to the field of SAR object detection are FOUR-fold:', '> • We introduce SARDet-100K, the first COCO-level large-scale multi-category dataset for SAR object detection, significantly addressing the scarcity of diverse and extensive SAR benchmarks.', '> • We rigorously identify and characterize the critical domain and model gaps inherent in traditional pretraining and finetuning strategies for SAR object detection.', '> • We propose the Multi-Stage with Filter Augmentation (MSFA) pretraining framework, a novel and highly effective solution that demonstrates remarkable generalizability and flexibility across diverse deep network models, setting new state-of-the-art performance.', '> • We establish a new open-source benchmark for SAR object detection by publicly releasing the SARDet-100K dataset and all associated code, aiming to foster rapid advancements and reproducible research in the field.', '20c25', '< While recent studies have focused on tasks related to low-level processing [69; 86], classification [83; 85; 93; 26; 50; 78] and pretrain [29; 29], they have attempted to integrate classic handcrafted features into modern neural networks for robust SAR image feature extraction and refinement. In contrast, our work does not simply inject such handcrafted features into networks, but explores the benefits and potentials of handcrafted features in domain adaptation and SAR object detection under modern deep neural networks. This research area remains largely unexplored, and our work aims to bridge this gap.', '---', '> While recent studies have focused on tasks related to low-level processing [69; 86], classification [83; 85; 93; 26; 50; 78], and pretraining [29], they have largely attempted to integrate classic handcrafted features into modern neural networks for robust SAR image feature extraction and refinement. In contrast, our work moves beyond simply injecting handcrafted features into networks; we thoroughly explore their benefits and potentials in domain adaptation and SAR object detection within modern deep neural networks. This specific research area remains largely unexplored, and our work aims to bridge this gap.', '100c105', '< Our research endeavours to overcome the current obstacles prevalent in SAR object detection. We anticipate our contributions will pave the way for future research and innovations in this domain.   Furthermore, we convert all dataset annotations into the COCO annotation format [34]. This step ensures consistency and compatibility among the different datasets. Consequently, the merged dataset, SARDet-100K, is also standardized in the COCO format, which is readily compatible with popular open-source detection code frameworks, eliminating the need for additional manual data preprocessing. Fig. S6(b) provides an overview of the category-level statistics for the SARDet-100K dataset.', '---', '> Our research endeavors to overcome the current obstacles prevalent in SAR object detection, and we anticipate our contributions will pave the way for future research and innovations in this domain. Furthermore, all dataset annotations are converted into the COCO annotation format [34], ensuring consistency and compatibility. This standardization makes SARDet-100K readily compatible with popular open-source detection frameworks, eliminating the need for additional manual data preprocessing. Fig. S6(b) provides an overview of the category-level statistics for the SARDet-100K dataset.', '491d495', '< ']
