Scheduling Inference Workloads in the Computing Continuum with Reinforcement Learning

Ferran Diego, Alessio Sacco, Jordi Luque, Flavio Esposito, Antonino Angi, Juan José Nieto

Published: 30 Jun 2025, Last Modified: 15 May 2025OpenReview Archive Direct UploadEveryoneCC BY 4.0

Abstract: As many recent real-time applications (e.g., Aug- mented/Virtual Reality, cognitive assistance) rely on Deep Neural Networks (DNNs) for inference tasks, edge computing has ap- peared as a key enabler to deploy such applications as closest to the data sources, helping meet stringent latency and throughput demands. However, the limited resources typically available at the edge create significant challenges for efficiently managing inference workloads. Thus, a trade-off between network and processing time should be considered when it comes to end-to- end delay requirements. In this paper, we focus on the problem of scheduling inference jobs of DNN models in such edge-cloud continuum at short timescales (i.e., a few milliseconds). Through simulations, we analyze several policies in the realistic network settings and workloads of a large ISP, highlighting the need for a dynamic scheduling policy that can adapt to varying network conditions and workload demands. To this end, we propose ASET, a Reinforcement Learning (RL)-based scheduling algorithm able to dynamically adapt its decisions according to the system conditions. Our results show that ASET effectively provides the best performance compared to a set of static policies when scheduling over a distributed pool of edge-cloud resources.