Accelerating Hardware-Aware NAS with ML-Based Edge GPU Performance Modeling

Aishneet Juneja, Matthew Grenier, Md Hasibul Amin, Ramtin Zand

Published: 2025, Last Modified: 06 Nov 2025COINS 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In this paper, we introduce a machine learning-based performance modeling framework that accurately predicts both inference latency and energy consumption of neural networks deployed on edge GPUs, specifically targeting the NVIDIA Jetson Nano platform. To support this effort, we construct a comprehensive benchmark dataset consisting of latency and energy measurements for a wide range of deep learning models, spanning from lightweight architectures with 100 million multiply-accumulate (MAC) operations to large models with up to 50 billion MACs. Our performance modeler demonstrates high predictive accuracy across a diverse set of well-known architectures and generalizes effectively to unseen models. We further integrate our predictor into a hardware-aware neural architecture search (NAS) framework, showing that it accelerates the NAS process from days to hours without sacrificing the quality of the selected architectures. This work highlights the potential of learning-based performance estimation to enable fast, efficient, and hardware-aware deep learning model design for edge deployment.

External IDs:dblp:conf/coins/JunejaGAZ25