Abstract: This paper investigates the content distribution in a hotspot area in which multiple cache-enabled unmarried aerial vehicles (UAVs) are deployed to offload part of the data traffic in a heavy-crowded cellular network. We formulate an optimization problem which minimizes the sum content acquisition delay of all users by designing the multiuser association and cache placement jointly with UAV transmission power and trajectory over a given flight duration. The non-convexity of the formulated problem and the uncertainty of the dynamic environment make it difficult and impractical to solve using traditional optimization methods. Thus we model our problem as a partially observable stochastic game where the macro base station (MBS) and UAVs act as agents and interact with the environment to receive distinctive observations. To guide exploration, we propose a new exploration criterion that gives each UAV agent an intrinsic reward when it explores beyond the boundary of explored regions (BeBold). Then we propose a Dual-Clip Proximal Policy Optimization (DC-PPO) algorithm to solve our problem. Extensive numerical results demonstrate that the proposed algorithm is superior than the PPO-based algorithm and the DC-PPO-based algorithm without exploration criterion.
0 Replies
Loading