Meta-Reinforcement Learning for Fast and Data-Efficient Spectrum Allocation in Dynamic Wireless Networks
Abstract: Efficient spectrum allocation is vital for 5G/6G networks, yet traditional deep reinforcement learning (DRL) methods suffer from high sample complexity and unsafe exploration that can disrupt network stability. To address these challenges, we propose a meta-learning framework that learns a robust initial policy capable of rapid and safe adaptation to changing wireless conditions. We implement three meta-learning architectures using model-agnostic techniques—model-agnostic meta-learning (MAML), recurrent neural network (RNN), and RNN with a self-attention mechanism—and compare them against a DRL baseline and classical heuristic approaches in a dynamic integrated access/backhaul (IAB) environment. The attention-based agent achieves a peak throughput of $\approx 49$ Mbps, reducing SINR and latency violations by over 60% relative to PPO, and attains 97% of the fairness level of the exhaustive-search upper bound. These results demonstrate that meta-learning enables data-efficient, reliable, and scalable spectrum management for next-generation wireless systems.
External IDs:doi:10.1109/lwc.2026.3668310
Loading