Federated DRL-based Coordination of Multi-UAVs for Wildfire Tracking

Mona Raoufi; Jun Shen; Tieling Zhang; Haoran Li

Federated DRL-based Coordination of Multi-UAVs for Wildfire Tracking

Mona Raoufi, Jun Shen, Tieling Zhang, Haoran Li

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: UAV Formation Planning and Control, Fire Front Tracking, Multi-Agent Systems, Deep Reinforcement Learning, Federated Learning, Scalability

Abstract: Formation control continues to pose significant challenges in the field of multi-agent deep reinforcement learning (DRL). This paper presents a formation strategy for multiple UAVs engaged in large-scale wildfire tracking. The proposed approach leverages the Deep Deterministic Policy Gradient (DDPG) algorithm to enable individual UAVs to adapt their path planning and control policies in real time. Although effective for single-UAV scenarios, standard DDPG does not scale well to multi-UAV coordination. Moreover, wildfire fronts rarely evolve symmetrically, as environmental factors such as wind, terrain, and fuel conditions can accelerate fire spread in certain directions, producing irregular boundaries that complicate the maintenance of uniformly spaced formations. To address these limitations, the proposed framework integrates Federated Learning (FL) with DDPG to facilitate collaborative policy refinement without exchanging raw data, \textcolor{blue}{introducing a novel performance-weighted federated averaging scheme that prioritizes policies from UAVs demonstrating better formation stability.} \textcolor{blue}{While conventional FL uses equal-weight aggregation, our approach applies distance- and performance-based weighting to better handle the non-IID data distributions that arise from asymmetric wildfire fronts.} \textcolor{blue}{These asymmetries—driven by factors such as wind direction and terrain—produce irregular fire boundaries that disrupt uniform formations, making weighted aggregation essential for stable coordination.} We apply FL in a novel way to the DDPG components governing linear velocity and its corresponding control gain, both of which are critical for acceleration control and inter-UAV spacing. The simulation results indicate that our method, FL-DDPG, yields significantly improved formation stability—with 2.5 m average spacing variance compared to 14 m for standard DDPG—and improves the average episode reward from –355.45 to –122.21. \textcolor{blue}{Overall, the results underscore the importance of performance-weighted aggregation in achieving robust, decentralized coordination in complex wildfire environments.}

Primary Area: reinforcement learning

Submission Number: 10338

Loading