Accelerating Model-Based Reinforcement Learning with State-Space World Models

Elie Aljalbout; Maria Krinner; Angel Romero; Davide Scaramuzza

Accelerating Model-Based Reinforcement Learning with State-Space World Models

Elie Aljalbout, Maria Krinner, Angel Romero, Davide Scaramuzza

Published: 06 Mar 2025, Last Modified: 15 Apr 2025ICLR 2025 Workshop World ModelsEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Model-based Reinforcement Learning, State-Space Models, Sequence Modeling

TL;DR: We make model-based RL faster by using state-space models for sequence modeling in the world model. We showcase the results on a real robot.

Abstract: Model-based RL (MBRL) simultaneously learns a policy and a world model that captures the environment’s dynamics and rewards. The world model can either be used for planning, for data collection, or to provide first-order policy gradients for training. Leveraging a world model significantly improves sample efficiency compared to model-free RL. However, training a world model alongside the pol- icy increases the computational complexity, leading to longer training times that are often intractable for complex real-world scenarios. In this work, we propose a new method for accelerating model-based RL using state-space world models. Our approach leverages state-space models (SSMs) to parallelize the training of the dynamics model, which is typically the main computational bottleneck. Ad- ditionally, we propose an architecture that provides privileged information to the world model during training, which is particularly relevant for partially observable environments. We evaluate our method in several real-world agile quadrotor flight tasks, involving complex dynamics, for both fully and partially observable envi- ronments. We demonstrate a significant speedup, reducing the world model train- ing time by up to 10 times, and the overall MBRL training time by up to 4 times. This benefit comes without compromising performance, as our method achieves similar sample efficiency and task rewards to state-of-the-art MBRL methods.

Submission Number: 25

Loading