Actively Adaptive Multi-Armed Bandit Based Beam Tracking for mmWave MIMO Systems

Ashim Kumar, Arghyadip Roy, Ratnajit Bhattacharjee

Published: 2024, Last Modified: 29 Sept 2024WCNC 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Beam tracking methods are instrumental for efficient use of the multi-gigahertz bandwidth available at mmWave frequencies. In this paper, we propose a Multi-armed Bandit (MAB) based Reinforcement Learning (RL) algorithm to periodically select transmitter-receiver beam pairs so as to maximize the average spectral efficiency. Contrary to a traditional Bayesian MAB-based approach, the MAB algorithm proposed by us can track a user as it moves across multiple correlation distances. The algorithm keeps track of the received signal strength to detect a change in the channel correlation and adjusts its strategy to adapt to the new channel conditions. We derive an upper bound on the regret of the proposed algorithm. The proposed algorithm is evaluated on channel data generated using the open-source simulator NYUSIM and is observed to outperform existing algorithms, thus removing the requirement of repeated initial access procedures.