Abstract: Due to wide field of view and background confusion, remote sensing objects are small and densely packed, hence commonly used detection methods detecting small objects are not satisfactory. In this article, we propose the multicontextual information aggregation YOLO (MCIA-YOLO) method, combining three novel modules to effectively aggregate multicontextual information across channels, depths, and pixels. First, the channel-spatial information aggregation module assembles spatial global features pursuant to channel contextual information, increasing the density of key information. Second, the shallow-deep information sparse aggregation module applies a sparse cross self-attention mechanism. By sparsely correlating long-range dependency information across different regions, the representation capability of a small target is enhanced while removing redundant information. Third, to enrich local multiscale features and better identify dense targets, multiscale weighted aggregation module convolves multireceptive field information and performs weighted fusion. Our method demonstrates satisfactory performance on dataset VisDrone2019, UAVDT, and NWPU VHR-10, especially in small objects detection, surpassing several state-of-the-art methods.
Loading