Resolution-Aware Criss-Cross Attention Detector for Small Object Detection in Aerial Images

Published: 29 Jun 2025, Last Modified: 13 Nov 2025ICMR '25: Proceedings of the 2025 International Conference on Multimedia RetrievalEveryoneRevisionsCC BY 4.0
Abstract: Detecting small objects in large-scale, high-resolution aerial images presents significant challenges. Most existing detectors focus primarily on the design of detection heads and fusion layers, often overlooking information loss in the backbone and the excessive computational resources required, which are particularly constrained in aerial image analysis. To address the aforementioned challenges, we propose the Resolution-Aware Criss-Cross Attention Detector (RACDet), which effectively leverages the contextual information embedded in an innovative backbone RACNet of aerial images. By decomposing the position information into orthogonal horizontal and vertical components, we achieve efficient modeling of spatial dependencies. For each pixel, RACNet gathers contextual information from all other pixels in the same position, establishing position relationships early, which can guide the subsequent processing in convolutional networks across different resolutions. The proposed method not only provides an adaptive representation of feature maps at multi-scale resolutions using normalized position encoding, but also enhances the detection accuracy of small objects by leveraging a regression loss function based on smooth Gaussian Wasserstein distance. We evaluate our method on two challenging aerial image datasets, including VisDrone2019 and UAVDT. Comprehensive experiments show that our approach achieves state-of-the-art performance while significantly decreasing the number of FLOPs.
Loading