Multi-scale Traffic Camera Image Detection Network Based on Improved YOLOv8

Zhihao Peng, Xinyuan Qi, Sheng Wu, Jianga Shang, Linquan Yang

Published: 2024, Last Modified: 26 Jan 2026PRICAI (4) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: With the extensive deployment of traffic cameras across cities, utilizing these devices for vehicle detection has become essential for efficient traffic management. However, variations in vehicle sizes and distances between vehicles and cameras lead to inconsistent scaling of vehicles in the images, which presents a challenge for feature extraction. Furthermore, conventional object detection algorithms fail to fully utilize the rich feature information across multiple layers. To solve these issues, we develop a novel and advanced multi-scale traffic YOLO (MT-YOLO), based on the YOLOv8 model. Firstly, we propose the spatial multi-scale (SMS) attention mechanism to improve the extraction capability of spatial multi-scale vehicle features and aggregate these features for dynamic spatial modulation. Secondly, we develop the multi-scale dilation transposition CARAFE (MDT-CARAFE) upsampling method, which utilizes dilated convolutions to enhance the receptive field at different scales during the upsampling process. Lastly, we augment the network with a detection head and replace the original detection head with the adaptively spatial feature fusion(ASFF) detection head, which dynamically selects and fuses multi-layer feature maps, enabling the network to better capture multi-scale information. Our experiments demonstrate that the MT-YOLO model greatly improves results on the traffic camera detection dataset., achieving an mAP@0.5 of 80.9%, with a precision of 92% and recall of 93%. These results surpass the performance of leading detection algorithms. In addition, the model takes only 12.5 milliseconds to complete an inference, which makes it highly deployable and valuable for real-world applications.