Adapting Multi-Model Inference Pipelines With Diffusion-Based Reinforcement Learning in Edge Computing
Abstract: The increasing demand for real-time processing tasks is driving the need for multi-model inference pipelines on edge devices. However, deploying these pipelines cost-effectively while optimizing Quality of Service (QoS) and costs presents significant challenges. Existing solutions often overlook device resource constraints, focusing primarily on inference accuracy and cost efficiency. To tackle this issue, we develop a framework for adaptively configuring multi-model inference pipelines. Specifically: 1) We model the multi-model inference pipeline adaptation problem by considering pipeline costs and device resource limitations. 2) We create a feature extraction module using residual networks and a load prediction model based on long short-term memory to gather comprehensive node and pipeline status information. We then employ a diffusion-based reinforcement learning algorithm for online configuration decision-making. 3) Experiments in a real Kubernetes cluster demonstrate that our approach significantly improves QoS, reduces costs, and shortens decision-making time for complex pipelines compared to baseline algorithms.
External IDs:dblp:journals/tsc/ShengTGYWJ26
Loading