Multi-Accelerator Neural Network Inference via TensorRT in Heterogeneous Embedded Systems

Yuxiao Zhou; Zhishan Guo; Zheng Dong; Kecheng Yang

Multi-Accelerator Neural Network Inference via TensorRT in Heterogeneous Embedded Systems

Yuxiao Zhou, Zhishan Guo, Zheng Dong, Kecheng Yang

Published: 01 Jan 2024, Last Modified: 26 Jul 2025COMPSAC 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Neural Network Inference (NNI) has become a critical element in mobile and autonomous systems, particularly for time-sensitive operations like obstacle detection and avoidance. Alongside execution time, energy consumption holds significant importance in such workloads, given that power is a limited resource in these systems. Modern System-on-Chips (SoCs) in mobile and autonomous devices are equipped with a diverse range of accelerators, each characterized by distinct power and performance features. Adapting to dynamically changing physical conditions, the execution flow of these crucial workloads can be optimized to utilize multiple accelerators, allowing for a flexible trade-off between performance and energy consumption. In this study, we leverage multiple accelerators within an SoC to execute NNI using NVIDIA TensorRT. Our primary goal is to enable an energy-performance trade-off by intelligently distributing layers of a neural network between accelerators that prioritize performance and those that emphasize power efficiency. Initially, we analyze the execution time and energy characteristics of neural network layer execution on various accelerators. Subsequently, we examine various factors influencing layer execution. Finally, we propose two algorithms to determine the mapping of layers to accelerators, minimizing energy consumption while adhering to a predetermined target NN inference execution time. We evaluate our approaches on the NVIDIA AGX Orin SoC using the commonly used ResNetSO model. According to the experiment results, we suggest adopting a coarse-grained layer grouping strategy. For applications with stringent real-time requirements, it is recommended to utilize the proposed LTN approach to better achieve the target execution time. Alternatively, in other scenarios, the Knapsack approach may be chosen for potential improvements in energy consumption.

Loading