Deep Reinforcement Learning Based Computation Offloading in Heterogeneous MEC Assisted by Ground Vehicles and Unmanned Aerial Vehicles

Published: 01 Jan 2022, Last Modified: 14 Nov 2024WASA (3) 2022EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Compared with traditional mobile edge computing (MEC), heterogeneous MEC (H-MEC), which is assisted by ground vehicles (GVs) and unmanned aerial vehicles (UAVs) simultaneously, is attracting more and more attention from both academia and industry. By deploying base stations (along with edge servers) on GVs or UAVs, H-MEC is more suitable for access-demand dynamically-changing network environments, e.g., sports matches, traffic management, and emergency rescue. However, it is non-trivial to perform real-time user association and resource allocation in large-scale H-MEC environments. Motivated by this, we propose a shared multi-agent proximal policy optimization (SMAPPO) algorithm based on the centralized training and distributed execution framework. Due to the NP-hard difficulty of jointly optimizing user association and resource allocation for H-MEC, we adopt the actor-critic-based online-policy gradient (PG) algorithm to obtain near-optimal solutions with low scheduling complexities. In addition, considering the low sampling efficiency of PG, we introduce proximal policy optimization to increase the training efficiency by importance sampling. Moreover, we leverage the idea of centralized training and distributed execution to improve the training efficiency and reduce scheduling complexity, so that each mobile device makes decisions based only on local observation and learns other MDs’ experience from a shared replay buffer. Extensive simulation results demonstrate that SMAPPO can achieve more satisfactory performances than traditional algorithms.
Loading