Deductive Synthesis of Reinforcement Learning Agents for Infinite Horizon Tasks

Yuning Wang, He Zhu

Published: 2025, Last Modified: 20 Nov 2025CAV (4) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: We propose a deductive synthesis framework for constructing reinforcement learning (RL) agents that provably satisfy temporal reach-avoid specifications over infinite horizons. Our approach decomposes these temporal specifications into a sequence of finite-horizon subtasks, for which we synthesize individual RL policies. Using formal verification techniques, we ensure that the composition of a finite number of subtask policies guarantees satisfaction of the overall specification over infinite horizons. Experimental results on a suite of benchmarks show that our synthesized agents outperform standard RL methods in both task performance and compliance with safety and temporal requirements.

External IDs:dblp:conf/cav/WangZ25