Towards Safe Reinforcement Learning with Reduced Conservativeness: A Case Study on Drone Flight Control

Loizos Hadjiloizou, Michael C. Welle, Hang Yin, Danica Kragic

Published: 2025, Last Modified: 15 Jan 2026IROS 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Incorporating formal methods into reinforcement learning (RL) has the potential to result in the best of both worlds, combining the robustness of formal guarantees with the adaptability and learning capabilities of RL, though careful design is needed to balance safety and exploration. In this work, we propose a framework to mitigate this loss of exploration while still allowing for the safety of the system to be ensured. Specifically, we introduce a less restrictive method that can reduce the conservativeness of formal methods by refining a disturbance model using online collected data and it evaluates the safety of a learning-based controller, using computationally efficient zonotopic reachability analysis for the safety analysis to facilitate a real-time implementation. We validate the framework in a real-world drone flight through a canyon, where the drone is subjected to unknown external disturbances and the framework is tasked with learning those disturbances online and adjusting the safety guarantees accordingly. The results show that the framework enables a less restrictive online training of learning-based controllers without compromising the safety of the system.

External IDs:dblp:conf/iros/HadjiloizouWYK25