Toward Real-Time and Efficient Perception Workflows in Software-Defined Vehicles

Published: 01 Jan 2025, Last Modified: 26 Jul 2025IEEE Internet Things J. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: With the growing demand for software-defined vehicles (SDVs), deep learning-based perception models have become increasingly important in intelligent transportation systems. However, these models face significant challenges in enabling real-time and efficient SDV solutions due to their substantial computational requirements, which are often unavailable in resource-constrained vehicles. As a result, these models typically suffer from low throughput, high latency, and excessive GPU/memory usage, making them impractical for real-time SDV applications. To address these challenges, our research focuses on optimizing model and workflow performance through the integration of pruning and quantization techniques across various computational environments, utilizing frameworks, such as PyTorch, open neural network exchange (ONNX), ONNX Runtime, and TensorRT. We systematically explore and evaluate three distinct pruning methods in combination with multiprecision quantization workflows (FP32, FP16, and INT8) and present the results based on four evaluation metrics: 1) inference throughput; 2) latency; 3) GPU/memory usage; and 4) accuracy. Our designed techniques, including pruning and quantization, along with optimized workflows, can achieve up to $18\times $ faster inference speed and $16.5\times $ higher throughput, while reducing GPU/memory usage by up to 30%, all with minimal impact on accuracy. Our work suggests using the Torch-ONNX-TensorRT workflow quantized with 16-bit floating point precision (FP16) precision and group pruning as the optimal strategy for maximizing inference performance. It demonstrates great potential in optimizing real-time, efficient perception workflows in SDVs, contributing to the enhanced application of deep learning models in resource-constrained environments.
Loading