Tri-Axial Scaling in Aerial Object Detection: Model Size, Dataset Size and Quality, and Test-Time Inference in the Cadot Challenge
Abstract: Advancements in remote sensing technology and deep learning techniques have paved the way for accurate aerial object detection in urban environments. However, object detection in these settings remains challenging due to dense scenes, small and occluded objects, and high variability across geographic domains. To tackle these challenges, we propose a tri-axial scaling framework for aerial object detection that systematically improves performance along three dimensions: model size, dataset size and quality, and inference strategy. First, we explore the use of larger backbone architectures to enhance feature representation. Second, we apply diffusion-based data augmentation and balanced class sampling to improve training data diversity and address class imbalance. Third, we incorporate test-time augmentation and ensemble models to increase robustness during inference. Our solution ranks first on the leaderboard in the IEEE ICIP 2025 - CADOT challenge. The source code and pretrained models are available at https://github.com/yjwong1999/Double_J_CADOT_Challenge.
External IDs:dblp:conf/icip/WongTTKH25
Loading