PWM: Policy Learning with Multi-Task World Models

Ignat Georgiev; Varun Giridhar; Nicklas Hansen; Animesh Garg

PWM: Policy Learning with Multi-Task World Models

Ignat Georgiev, Varun Giridhar, Nicklas Hansen, Animesh Garg

Published: 22 Jan 2025, Last Modified: 09 May 2025ICLR 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: reinforcement learning, model-based reinforcement learning, continuous control, world models

TL;DR: We propose an MBRL method that leverages pre-trained, multi-task world models for efficient policy learning using first-order optimization.

Abstract: Reinforcement Learning (RL) has made significant strides in complex tasks but struggles in multi-task settings with different embodiments. World model methods offer scalability by learning a simulation of the environment but often rely on inefficient gradient-free optimization methods for policy extraction. In contrast, gradient-based methods exhibit lower variance but fail to handle discontinuities. Our work reveals that well-regularized world models can generate smoother optimization landscapes than the actual dynamics, facilitating more effective first-order optimization. We introduce Policy learning with multi-task World Models (PWM), a novel model-based RL algorithm for continuous control. Initially, the world model is pre-trained on offline data, and then policies are extracted from it using first-order optimization in less than 10 minutes per task. PWM effectively solves tasks with up to 152 action dimensions and outperforms methods that use ground-truth dynamics. Additionally, PWM scales to an 80-task setting, achieving up to 27\% higher rewards than existing baselines without relying on costly online planning. Visualizations and code are available at [imgeorgiev.com/pwm](https://www.imgeorgiev.com/pwm/).

Supplementary Material: pdf

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 8505

Loading