Track: Type A (Regular Papers)
Keywords: Compositional Strategies, Reinforcement Learning, Temporal Composition
Abstract: Traditional Reinforcement Learning (RL) methods can solve complex, long-horizon tasks but struggle to generalize to new non-Markov tasks without retraining. Recent compositional approaches address this by learning a set of sub-policies that can be composed at test time to solve unseen, temporally extended tasks—formulated as finite state automata (FSA)—in a zero-shot manner. However, existing methods are typically restricted to discrete domains or suffer from sub-optimality in stochastic environments. We address both limitations by extending compositional RL to continuous state spaces using Radial Basis Function (RBF) features and a novel regression-based value iteration algorithm that enables optimal composition over learned sub-policies. Our method supports more globally efficient planning in environments with spatially extended goals and achieves optimal behavior in both deterministic and stochastic settings, outperforming prior compositional baselines.
Serve As Reviewer: ~Tim_van_Gelder1
Submission Number: 7
Loading