On Adaptive Edge Microservice Placement: A Reinforcement Learning Approach Endowed With Graph Comprehension

Lixing Chen; Yang Bai; Pan Zhou; Youqi Li; Zhe Qu; Jie Xu

On Adaptive Edge Microservice Placement: A Reinforcement Learning Approach Endowed With Graph Comprehension

Lixing Chen, Yang Bai, Pan Zhou, Youqi Li, Zhe Qu, Jie Xu

Published: 01 Jan 2024, Last Modified: 15 Jan 2025IEEE Trans. Mob. Comput. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Microservice (MS) structures a service application as a collection of independently deployable service modules, making it particularly suitable for delivering complex applications in distributed computing systems. This article investigates MS architecture over Mobile Edge Computing (MEC) networks (hereafter referred to as EdgeMS) and studies an EdgeMS placement problem that aims to deploy MS modules over the MEC network in a manner that maximizes the reward of MS application providers. A novel algorithm called Dual-GNN Deep Deterministic Policy Gradient (DG-DDPG) is proposed to establish an intelligent EdgeMS placement policy for optimizing the location of MS modules and performing fractional computing resource allocation. DG-DDPG leverages the graph neural network (GNN) to comprehend the graph-structured information encapsulated in the MS application structure and MEC network. A dual-GNN core is constructed in DG-DDPG, one GNN for MS applications to distill knowledge from intricate connections between MS modules, and the other GNN for MEC networks to capture complicated interactions between edge sites when providing EdgeMS. DG-DDPG embeds the dual-GNN core in a DDPG-based reinforcement learning framework, which not only handles temporal dependencies between EdgeMS placement decisions for maximizing long-term reward but also supports continuous action space for enabling fractional resource allocation. In particular, the learning process of DG-DDPG is tailored to address hard constraints (i.e., computing capacity and MS application completeness) in the EdgeMS placement problem. We design constraint-based regularization terms and add them to the objective of DG-DDPG, which facilitates the identification of feasible placement decisions during learning. We carry out systematic experiments to evaluate the performance of DG-DDPG, and the results show that DG-DDPG outperforms state-of-the-art benchmarks in terms of reward, service delay and deployment cost.

Loading