Learning Compact Regular Decision Processes using Priors and Cascades

ICLR 2026 Conference Submission16646 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Offline Reinforcement Learning, Regular Decision Process, Automata Learning
TL;DR: We study offline RL for Regular Decision Processes (RDPs) and introduce the notion of priors for automaton learning and develop a new algorithm for learning more compact RDPs.
Abstract: In this work we study offline Reinforcement Learning (RL), and extend the previous work on learning Regular Decision Processes (RDPs), which are a class of non-Markovian environment, where the unknown dependency of future observations and rewards from the past interactions can be captured by some hidden finite-state automaton. We utilise the language metric introduced previously for an offline RL algorithm for RDPs, and introduce a novel algorithm to learn a significantly more compact RDP with cycles, which are crucial for scaling to larger, more complex environments. Key to our results is a novel notion of priors for automaton learning, that allows us to exploit prior domain-related knowledge, used to factor out of the state space any feature that is known a priori. We validate our approach experimentally and provide a Probably Approximately Correct (PAC) analysis of our algorithm, showing it enjoys a sample complexity polynomial in the relevant parameters.
Primary Area: reinforcement learning
Submission Number: 16646
Loading