Energy-Efficient Trajectory Optimization With Wireless Charging in UAV-Assisted MEC Based on Multi-Objective Reinforcement Learning

Fuhong Song; Mingsen Deng; Huanlai Xing; Yanping Liu; Fei Ye; Zhiwen Xiao

Energy-Efficient Trajectory Optimization With Wireless Charging in UAV-Assisted MEC Based on Multi-Objective Reinforcement Learning

Fuhong Song, Mingsen Deng, Huanlai Xing, Yanping Liu, Fei Ye, Zhiwen Xiao

Published: 01 Jan 2024, Last Modified: 24 Mar 2025IEEE Trans. Mob. Comput. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: This paper investigates the problem of energy-efficient trajectory optimization with wireless charging (ETWC) in an unmanned aerial vehicle (UAV)-assisted mobile edge computing system. A UAV is dispatched to collect computation tasks from specific ground smart devices (GSDs) within its coverage while transmitting energy to the other GSDs. In addition, a high-altitude platform with a laser beam is deployed in the stratosphere to charge the UAV, so as to maintain its flight mission. The ETWC problem is characterized by multi-objective optimization, aiming to maximize both the energy efficiency of the UAV and the number of tasks collected via optimizing the UAV's flight trajectories. The conflict between the two objectives in the problem makes it quite challenging. Recently, some single-objective reinforcement learning (SORL) algorithms have been introduced to address the aforementioned problem. Nevertheless, these SORLs adopt linear scalarization to define the user utility, thus ignoring the conflict between objectives. Furthermore, in dynamic MEC scenarios, the relative importance assigned to each objective may vary over time, posing significant challenges for conventional SORLs. To solve the challenge, we first build a multi-objective Markov decision process that has a vectorial reward mechanism. There is a corresponding relationship between each component of the reward and one of the two objectives. Then, we propose a new trace-based experience replay scheme to modify sample efficiency and reduce replay buffer bias, resulting in a modified multi-objective reinforcement learning algorithm. The experiment results validate that the proposed algorithm can obtain better adaptability to dynamic preferences and a more favorable balance between objectives compared with several algorithms.

Loading