GPU Partitioning & Neural Architecture Sizing for Safety-Driven Sensing in Autonomous Systems

Shengjie Xu, Clara Hobbs, Yukai Song, Bineet Ghosh, Tingan Zhu, Sharmin Aktar, Lei Yang, Yi Sheng, Weiwen Jiang, Jingtong Hu, Parasara Sridhar Duggirala, Samarjit Chakraborty

Published: 2024, Last Modified: 04 Mar 2026ICAA 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Neural networks are now routinely used for perception processing in autonomous systems. Often, these neural networks are used to estimate the state of the system, such as distance and velocity of the car in front, that is used in downstream control tasks. While significant advances in neural architecture search and sizing have been made towards improving inference quality, they do not take into account the effect of these improvements in the performance of the overall system. In this paper, we examine a setup where multiple neural networks for estimating various state components of the same system share the same graphics processing unit (GPU) — a limited computational resource. We address the problem of optimal resource allocation for each neural network, e.g., how to suitably size these networks, while improving the overall performance — specifically, safety — of the system. In particular, we distinguish between optimizing the performance of individual neural networks, versus optimizing the system-level performance or safety. Our main technical contribution is a set of techniques for neural architecture sizing with the goal of optimizing overall system safety for a given GPU capacity. Our evaluation on two different benchmarks shows that we can explore the architecture space with 10x to 100x improvements in running time.
Loading