DKA-YOLO: Enhanced Small Object Detection via Dilation Kernel Aggregation Convolution Modules

Yicheng Qiu, Feng Sha, Li Niu

Published: 01 Jan 2024, Last Modified: 07 Apr 2025IEEE Access 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Small object detection represents a pivotal sub-domain within the field of computer vision. Previous research aimed at enhancing detection accuracy has included augmenting the head layer, refining multi-layer feature pooling techniques, incorporating attention mechanisms, and optimizing loss functions. Despite these efforts, issues such as false negatives and classification ambiguities persist, leading to suboptimal outcomes. To solve these issues, DKA-YOLO is proposed as a new model focusing on improving convolution kernel structures. We develop novel modules based on the concept of dilation kernels aggregation convolution, integrate them into the robust and advanced YOLOv8 framework, and apply the enhanced model to small object detection tasks. The proposed modules include the large size dilation kernels aggregation convolution for the backbone layer, which combines large kernel sizes with dilation convolution structure, and utilizes extensive receptive fields to improve detailed feature extraction. Additionally, the multi-scale dilation kernels aggregation convolution is introduced in the neck layers to enhance performance and efficiency with a diverse set of kernels. Finally, the model’s head layer employs multi-scale convolution kernels detect to enhance feature expression diversity, generalization ability, and computational efficiency of detection. Experimental validation on public datasets demonstrates a significant improvement in detection accuracy by our method, with an increase in mean average precision by 1.5% on the VisDrone and 1.15% on the UAVDT compared to advanced previous methods. Our method also surpasses other previous models in comparative experiments, highlighting its superiority and robustness.