Keywords: Model-Based Reinforcement Learning, Skill Dynamics Model
Abstract: Model-based reinforcement learning (RL) is a sample-efficient way of learning complex behaviors by leveraging a learned dynamics model to optimize a policy and plan actions in imagination. Prior work in model-based RL has been mostly confined to using a single-step dynamics model, akin to a human paying attention to every single muscle movement for planning. Instead, humans plan with high-level skills (i.e. temporal abstraction of primitive actions) to solve long-horizon tasks. In this work, we present a Skill-based Model-based RL framework (SkiMo) that enables planning in the skill space using a skill dynamics model, which directly predicts the skill outcomes, rather than step-by-step predicting all small details in the intermediate states. For accurate and efficient long-term planning, we jointly learn the skill dynamics model and a skill repertoire from large prior experience. We then harness the learned skill dynamics model to accurately simulate and plan over long horizons in the skill space, which enables efficient downstream learning of long-horizon, sparse reward tasks. Experimental results in navigation and manipulation domains show that SkiMo extends the temporal horizon of model-based approaches and improves the sample efficiency for both model-based RL and skill-based RL.