Parallelizing Model-based Reinforcement Learning Over the Sequence Length

ZiRui Wang; Yue DENG; Junfeng Long; Yin Zhang

Parallelizing Model-based Reinforcement Learning Over the Sequence Length

ZiRui Wang, Yue DENG, Junfeng Long, Yin Zhang

Published: 25 Sept 2024, Last Modified: 04 Jan 2025NeurIPS 2024 posterEveryoneRevisionsBibTeXCC BY-NC-ND 4.0

Keywords: Model-based reinforcement learning, world model, parallelization

TL;DR: This paper introduces the PaMoRL framework, which parallelizes MBRL over the sequence length, improving training speed and sample efficiency.

Abstract: Recently, Model-based Reinforcement Learning (MBRL) methods have demonstrated stunning sample efficiency in various RL domains. However, achieving this extraordinary sample efficiency comes with additional training costs in terms of computations, memory, and training time. To address these challenges, we propose the **Pa**rallelized **Mo**del-based **R**einforcement **L**earning (**PaMoRL**) framework. PaMoRL introduces two novel techniques: the **P**arallel **W**orld **M**odel (**PWM**) and the **P**arallelized **E**ligibility **T**race **E**stimation (**PETE**) to parallelize both model learning and policy learning stages of current MBRL methods over the sequence length. Our PaMoRL framework is hardware-efficient and stable, and it can be applied to various tasks with discrete or continuous action spaces using a single set of hyperparameters. The empirical results demonstrate that the PWM and PETE within PaMoRL significantly increase training speed without sacrificing inference efficiency. In terms of sample efficiency, PaMoRL maintains an MBRL-level sample efficiency that outperforms other no-look-ahead MBRL methods and model-free RL methods, and it even exceeds the performance of planning-based MBRL methods and methods with larger networks in certain tasks.

Supplementary Material: zip

Primary Area: Reinforcement learning

Submission Number: 5860

Loading