NeuroBalancer: Balancing System Frequencies With Punctual Laziness for Timely and Energy-Efficient DNN Inferences

Kyungmin Bin, Seyeon Kim, Sangtae Ha, Song Chong, Kyunghan Lee

Published: 01 Jan 2025, Last Modified: 16 May 2025IEEE Trans. Mob. Comput. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: On-device deep neural network (DNN) inference is often desirable for user experience and privacy. Existing solutions have fully utilized resources to minimize inference latency. However, they result in severe energy inefficiency by completing DNN inference much earlier than the required service interval. It poses a new challenge of how to make DNN inferences in a punctual and energy-efficient manner. To tackle this challenge, we propose a new resource allocation strategy for DNN processing, namely punctual laziness that disperses its workload as efficiently as possible over time within its strict delay constraint. This strategy is particularly beneficial for neural workloads since a DNN comprises a set of popular operators whose latency and energy consumption are predictable. Through this understanding, we propose NeuroBalancer, an operator-aware core and memory frequency scaling framework that balances those frequencies as efficiently as possible while making timely inferences. We implement and evaluate NeuroBalancer on off-the-shelf Android devices with various state-of-the-art DNN models. Our results show that NeuroBalancer successfully meets a given inference latency requirements while saving energy consumption up to 43.9% and 21.1% compared to the Android’s default governor and up to 42.1% and 18.6% compared to SysScale, the state-of-the-art mobile governor on CPU and GPU, respectively.