User Throughout Optimization via Deep Reinforcement Learning for Beam Switching in mmWave Radio Access Networks

Ramin Hashemi; Vismika Ranasinghe; Teemu Veijalainen; Petteri Kela; Risto Wichman

User Throughout Optimization via Deep Reinforcement Learning for Beam Switching in mmWave Radio Access Networks

Ramin Hashemi, Vismika Ranasinghe, Teemu Veijalainen, Petteri Kela, Risto Wichman

Published: 01 Jan 2024, Last Modified: 14 Aug 2025PIMRC 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In 5G, the analog beamforming architecture, typically used in millimeter wave (mmWave) bands, operates on a pre-defined beam set relying on grid-of-beams (GoB) approach. Therefore, efficient beam selection considering dynamic wireless environments incorporating the available measurements from the user equipment (UE) to enhance the overall throughput is of paramount importance. In typical beam selection scheme selecting a beam out of candidate beams would be to choose the beam that has the strongest reference signal received power (RSRP). This approach may not yield the optimal outcome. For example, selected beams might be reserved for UEs that cannot utilize all the available resource blocks (RBs) efficiently. To this end, a deep reinforcement learning (DRL)-based beam selection framework is introduced for maximizing the throughput of the users. We first formulate the proposed beam selection framework by utilizing the RSRP measurements, and time domain activation information of the beams to group the UEs in a way that the time and frequency resources are utilized more efficiently resulting in better observed throughput. The formulated problem is complex as the optimized policy for beam selection is to find a balance between signal quality and RB optimization simultaneously. Therefore, we employ recent advances in DRL to solve the formulated problem with an optimized policy. Specifically, proximal policy optimization (PPO) method incorporating recurrent neural networks (RNNs) for capturing the temporal properties of the wireless channel is exploited for training. Simulations demonstrate that up to 10% throughput gains can be achieved compared with baseline approaches.

Loading