Learning Spatially Refined Sub-Policies for  Temporal Task Composition in Continuous RL

Tim van Gelder; Herke van Hoof

Learning Spatially Refined Sub-Policies for Temporal Task Composition in Continuous RL

Tim van Gelder, Herke van Hoof

Published: 15 Oct 2025, Last Modified: 31 Oct 2025BNAIC/BeNeLearn 2025 OralEveryoneRevisionsBibTeXCC BY 4.0

Track: Type A (Regular Papers)

Keywords: Compositional Strategies, Reinforcement Learning, Temporal Composition

Abstract: Traditional Reinforcement Learning (RL) methods can solve complex, long-horizon tasks but struggle to generalize to new non-Markov tasks without retraining. Recent compositional approaches address this by learning a set of sub-policies that can be composed at test time to solve unseen, temporally extended tasks—formulated as finite state automata (FSA)—in a zero-shot manner. However, existing methods are typically restricted to discrete domains or suffer from sub-optimality in stochastic environments. We address both limitations by extending compositional RL to continuous state spaces using Radial Basis Function (RBF) features and a novel regression-based value iteration algorithm that enables optimal composition over learned sub-policies. Our method supports more globally efficient planning in environments with spatially extended goals and achieves optimal behavior in both deterministic and stochastic settings, outperforming prior compositional baselines.

Serve As Reviewer: ~Tim_van_Gelder1

Submission Number: 7

Loading