Formal Methods in Robot Policy Learning and Verification: A Survey on Current Techniques and Future Directions
Abstract: As hardware and software systems have grown in complexity, formal methods have been indispensable tools for rigorously specifying acceptable behaviors, synthesizing programs to meet these specifications, and validating the correctness of existing programs. In the field of robotics, a similar trend of rising complexity has emerged, driven in large part by the adoption of deep learning. While this shift has enabled the development of highly performant robot policies, their implementation as deep neural networks has posed challenges to traditional formal analysis, leading to models that are inflexible, fragile, and difficult to interpret. In response, the robotics community has introduced new formal and semi-formal methods to support the precise specification of complex objectives, guide the learning process to achieve them,and enable the verification of learned policies against them. In this survey, we provide a comprehensive overview of how formal methods have been used in recent robot learning research. We organize our discussion around two pillars: policy learning and policy verification. For both, we highlight representative techniques, compare their scalability and expressiveness, and summarize how they contribute to meaningfully improving realistic robot safety and correctness. We conclude with a discussion of remaining obstacles for achieving that goal and promising directions for advancing formal methods in robot learning.
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: Included a complete discussion of run-time monitoring methods divided into two categories. We discuss explicitly specification-based run-time monitoring methods in the first labeled paragraph of Section 4.4., citing the fundamental work by Donzé et al. (2013) and the survey by Bartocci et al. (2018), as well as more recent work by Bakhirkin & Basset (2019), Bonnah & Hoque (2022), Chalupa & Henzinger (2023), Pinisetty et al. (2017), Ferrando & Delzanno (2023), and Henzinger et al. (2025). The previous non-specification-based methods are discussed in the following paragraph.
Expanded the discussion of repair methods in the last labeled paragraph of Section 5.3 to include the work from Yang et al. (2022), as well as recent work from others by citing Dong et al. (2021), Usman et al. (2021), Sohn et al. (2023), Xing et al. (2024), Majd et al. (2024), and Tao & Thakur (2025). In the expanded discussion, we clarify how such recent works are most often used to repair neural networks acting as classifiers, generative models, or discrete-action policies with specifications meant for those uses (i.e., specifications constraining the policy's actions), from which we conclude that there remains a promising research direction toward larger scale policy repair.
Added additional citations for recent work related to automata-guided RL and CBF-based policy synthesis.
Miscellaneous formatting changes throughout.
Fixed title capitalization errors and formatting inconsistencies in references.
Add acknowledgements section after conclusion.
Assigned Action Editor: ~Oleg_Arenz1
Submission Number: 5454
Loading