Deep Reinforcement Learning for Beam Management in UAV Relay mmWave Networks

Dohyun Kim, Miguel R. Castellanos, Robert W. Heath Jr.

Published: 01 Jan 2024, Last Modified: 15 May 2025IEEE Commun. Mag. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Unmanned aerial vehicles (UAVs) offer a means to relay signals around obstacles in millimeter wave (mmWave) mobile ad hoc networks. Achieving these benefits, however, requires a dynamic beam management strategy that efficiently allocates resources for discovering, configuring, and exploiting communication links. Balancing these tasks is difficult due to the interplay between the overhead of beam acquisition and tracking, and the resulting data rate over the link. In this article, we showcase how deep reinforcement learning (DRL) can jointly address the problems of blockage and mobility in mmWave ad hoc networks. We first summarize the problem of relay selection with realistic over-head penalties in which the beam management training time is characterized and minimized through a sequential decision-making approach. We then describe how hierarchical learning can be leveraged for choosing between distinct frequency bands for communication by addressing the issues posed by differing precoding training procedures. We conclude by overviewing how learning algorithms will be an important tool to overcome the challenges faced by future ad hoc networks.