Niagara+: Scheduling Live ML Analytics Across Heterogeneous Device Processors and Edge Servers

Published: 2025, Last Modified: 27 Jan 2026IEEE Trans. Serv. Comput. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Intelligent applications rely significantly on the live machine learning pipeline, a couple of deep neural network (DNN) inference services, executed on mobile devices to meet functional requirements while ensuring user data privacy. However, executing these DNN services on resource-constrained mobile devices presents a considerable challenge: low throughput and high energy consumption of inference tasks. To address this issue, we propose Niagara+, a novel system designed to enhance throughput by jointly scheduling DNN inference services across heterogeneous processors on mobile devices and offloading services to powerful edge servers. To achieve this, Niagara+ encounters two critical challenges: unpredictable workload dynamics and high scheduling complexity. To effectively tackle these challenges, Niagara+ employs a predictive model to forecast incoming workload patterns and orchestrates service allocation across device heterogeneous processors and edge servers through a combination of two-step offline scheduling optimization and online service dispatching strategies. We implemented Niagara+ and conducted comprehensive experiments, demonstrating its superiority over state-of-the-art approaches, reducing DNN service latency by up to 2.6× under high-bandwidth networks and 9.1× under low-bandwidth networks, while consistently meeting stringent inference latency requirements.
Loading