QATFP-YOLO: Optimizing Object Detection on Non-GPU Devices with YOLO Using Quantization-Aware Training and Filter Pruning

Gift Idama, Yifan Guo, Wei Yu

Published: 01 Jan 2024, Last Modified: 02 Aug 2025ICCCN 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Object detection is significant in real-world applications, including self-driving cars, surveillance systems, and visionenabled robotic systems, among others. Despite the success of benchmark deep learning-based approaches, like YOLO, achieving high detection accuracy, they are typically computationally intensive and require GPUs to achieve optimal performance, preventing them from being widely deployed on low-power enduser devices. Particularly, when deploying these models on non-GPU devices, their inference speed is significantly degraded due to the lack of GPU support. To this end, in this paper, we propose an optimized object detection model called QATFP-YOLO (Quantization-Aware Training and Filter Pruning on YOLO), aiming to enhance inference speed on non-GPU devices, which could be trained and inferred on local end-user devices without GPU support. To reduce the computation complexity, we propose two optimized training strategies based on our QATFP-YOLO model by considering: (i) model quantization technique that reduces the model size and memory usage without sacrificing accuracy and (ii) filter pruning technique that removes redundant parameters from the model, further reducing memory usage and inference time. By evaluating the performance on a real smartphone, we find our QATFP-YOLO model achieves exceptional inference speeds, reaching approximately 88 frames per second, notably surpassing traditional YOLO-Lite models by over fourfold.

External IDs:dblp:conf/icccn/IdamaGY24