Automating the Refinement of Reinforcement Learning Specifications

Automating the Refinement of Reinforcement Learning Specifications

ICLR 2026 Conference Submission14128 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Reinforcement Learning Specifications, Automatic Specification Refinement, SpectRL

TL;DR: A framework that refines logical specifications without human intervention

Abstract: Logical specifications have been shown to help reinforcement learning algorithms in achieving complex tasks. However, when a task is under-specified, agents might fail to learn useful policies. In this work, we explore the possibility of improving coarse-grained logical specifications via an exploration-guided strategy. We propose **AutoSpec**, a framework that searches for a logical specification refinement whose satisfaction implies satisfaction of the original specification, but which provides additional guidance therefore making it easier for reinforcement learning algorithms to learn useful policies. **AutoSpec** is applicable to reinforcement learning tasks specified via the SpectRL specification logic. We exploit the compositional nature of specifications written in SpectRL, and design four refinement procedures that modify the abstract graph of the specification by either refining its existing edge specifications or by introducing new edge specifications. We prove that all four procedures maintain specification soundness, i.e. any trajectory satisfying the refined specification also satisfies the original. We then show how **AutoSpec** can be integrated with existing reinforcement learning algorithms for learning policies from logical specifications. Our experiments demonstrate that **AutoSpec** yields promising improvements in terms of the complexity of control tasks that can be solved, when refined logical specifications produced by **AutoSpec** are utilized.

Supplementary Material: zip

Primary Area: reinforcement learning

Submission Number: 14128

Loading