Multiagent Deep Reinforcement Learning for Joint Movement and User Association of UAV-BS Emergency Indoor User Service

Tae-Yoon Kim, Jihong Park, Junghwa Kang, Jaeyeol Lee, Soyi Jung, Jae-Hyun Kim

Published: 2026, Last Modified: 01 Mar 2026IEEE Internet Things J. 2026EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: This article investigates the joint optimization of uncrewed aerial vehicle-mounted base station (UAV-BS) movement and user association for multiple UAV-BSs providing emergency services to indoor users. Specifically, we focus on optimizing associations between UAV-BSs and indoor users within an outdoor-to-indoor path loss model that accounts for floor penetration. The primary objective is to determine the optimal associations between UAV-BSs and indoor users, while also addressing how multiple UAV-BSs should move to establish these associations quickly. To solve this problem, we propose a novel multiagent reinforcement learning (MARL) architecture featuring three key innovations: a dual-action structure that decouples the complex decision-making process into separate movement and association actions, a multiagent double deep $Q$ -network (MADDQN) to learn optimal policies, and prioritized experience replay (PER) to improve learning efficiency. Simulation results demonstrate that the proposed algorithm significantly outperforms baseline methods—including a multiagent deep $Q$ -network (MADQN), multiagent independent actor–critic (MAIAC), multiagent deep deterministic policy gradient (MADDPG), and a consensus-based bundle algorithm (CBBA)—across all metrics. Furthermore, a series of rigorous ablation studies systematically validates the contribution of each component. Overall, the simulation results validate the superiority and robustness of our proposed algorithm in dynamic and challenging indoor environments.
Loading