Keywords: Intrinsic Motivation, MARL, UAVs, Wildfire
Abstract: Wildfires are escalating in frequency and severity, particularly in high-risk regions such as Alberta, Canada, where traditional detection systems are becoming increasingly insufficient. Existing approaches often rely on centralized control or overlook key constraints, such as partial observability, terrain complexity, and communication limitations. To address this gap, we propose a fully decentralized multi-agent reinforcement learning (MARL) framework for wildfire detection using UAV swarms. Our method integrates real geographic data into a grid-based simulator and employs intrinsic-motivation-enhanced Independent Proximal Policy Optimization (IPPO), allowing each agent to learn independently and adaptively. This design is well-suited for large-scale, unstructured environments where centralized coordination is infeasible. Agents learn to balance exploration, fire detection, and risk mitigation through a hybrid reward scheme. Experimental results in simulation demonstrate the effectiveness of our method for early and reliable wildfire detection in large, remote landscapes. This work lays the foundation for scalable, robust, and communication-efficient UAV swarm systems for wildfire monitoring, with significant potential to reduce ecological, economic, and human costs.
Submission Number: 11
Loading