Long-Horizon Planning with Predictable Skills

Published: 09 May 2025, Last Modified: 28 May 2025RLC 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: model-based reinforcement learning, skill learning, long-horizon planning, long-term credit assignment, compounding model errors
TL;DR: We learn predictable, temporally extended skills in tandem with a skill world model, and solve long-horizon tasks by planning over the entire episode.
Abstract: Model-based reinforcement learning (RL) leverages learned world models to plan ahead or train in imagination. Recently, this approach has significantly improved sample efficiency and performance across various challenging domains ranging from playing games to controlling robots. However, there are fundamental limits to how accurate the long-term predictions of a world model can be, for example due to unstable environment dynamics or partial observability. These issues are further exacerbated by the compounding error problem. Model-based RL is therefore currently limited to short rollouts with the world model, and consequently struggles with long-horizon problems. We argue that this limitation can be addressed by modeling the outcome of temporally extended skills instead of the effect of primitive actions. To this end, we propose a mutual-information-based skill learning objective that ensures predictable, diverse, and task-related behavior. The resulting skills compensate for perturbations and drifts, enabling stable long-horizon planning. We design a sample-efficient hierarchical agent consisting of model predictive control with an abstract skill world model on the higher level, and skill execution on the lower level. We demonstrate that our approach, Stable Planning with Temporally Extended Skills (SPlaTES), solves a range of challenging long-horizon continuous control problems, outperforming competitive model-based and skill-based methods.
Submission Number: 136
Loading