Abstract: This work focuses on the secure beamforming problem in massive multiple input multiple output (MIMO) system. The optimization problem is modeled by the theory of reinforcement learning (RL). With asymptotic behavior of massive MIMO, detailed theoretical analysis of the proposed RL problem is presented. By policy gradient method we provide solution for the delay-aware large-scale RL problem. The proposed RL structure can dynamically optimize the system performance by observing the state of cache and acquiring feedback from the delay of packet without requiring channel estimation, which can avoid the imperfect channel state information (CSI) issue in massive MIMO system. We conduct numerical experiments by using asynchronous advantage actor critic (A3C) algorithm to solve the proposed RL problem with comparisons to the randomized policy in a time-variant wireless environment. It shows that by using the RL algorithm the delay of system can be reduced without using CSI.
0 Replies
Loading