Elastic Execution of Multi-Tenant DNNs on Heterogeneous Edge MPSoCs

Soroush Heidari, Mehdi Ghasemi, Young Geun Kim, Carole-Jean Wu, Sarma B. K. Vrudhula

Published: 2024, Last Modified: 09 Nov 2025SEC 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The growing complexity of machine learning (ML) tasks drives the rapid deployment of multi-tenant ML workloads at the edge presenting unique challenges due to the variable computational demands and strict latency requirements. This paper introduces a holistic elastic scheduler, EMERALD, designed to optimize the execution of multi-tenant machine learning (ML) workloads on heterogeneous edge (Multiprocessor System on Chip) MPSoCs under strict runtime constraints. EMERALD employs input resolution scaling to dynamically adjust the computational demands of deep neural networks (DNNs), thereby enhancing the ability to meet stringent latency requirements while maintaining high accuracy. The scheduler consists of two main components: a local greedy scheduler and a global scheduler. The local scheduler actively manipulates input resolution in response to deadline violations, selecting the resolutions that minimally impact accuracy and maximally reduce response time. The global scheduler, an Integer Linear Programming (ILP)based scheduler, fine-tunes the decisions of the local scheduler by considering factors such as DNN dependencies, scene complexity, hardware heterogeneity, and the trade-offs between accuracy and makespan associated with input scaling adjustments. This hierarchical approach allows EMERALD to effectively balance computational efficiency and accuracy, significantly reducing missed deadlines—achieving 11x and 12.3x fewer missed deadlines compared to CAMDNN and HEFT, respectively, in scenarios demanding 30 frames per second. The results underscore the critical role of adaptive input scaling in managing the complexities of edge-based ML deployments.