Planning, Fast and Slow: Online Reinforcement Learning with Action-Free Offline Data via Multiscale Planners

Chengjie Wu; Hao Hu; Yiqin Yang; Ning Zhang; Chongjie Zhang

Planning, Fast and Slow: Online Reinforcement Learning with Action-Free Offline Data via Multiscale Planners

Chengjie Wu, Hao Hu, Yiqin Yang, Ning Zhang, Chongjie Zhang

Published: 02 May 2024, Last Modified: 25 Jun 2024ICML 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: The surge in volumes of video data offers unprecedented opportunities for advancing reinforcement learning (RL). This growth has motivated the development of passive RL, seeking to convert passive observations into actionable insights. This paper explores the prerequisites and mechanisms through which passive data can be utilized to improve online RL. We show that, in identifiable dynamics, where action impact can be distinguished from stochasticity, learning on passive data is statistically beneficial. Building upon the theoretical insights, we propose a novel algorithm named Multiscale State-Centric Planners (MSCP) that leverages two planners at distinct scales to offer guidance across varying levels of abstraction. The algorithm's fast planner targets immediate objectives, while the slow planner focuses on achieving longer-term goals. Notably, the fast planner incorporates pessimistic regularization to address the distributional shift between offline and online data. MSCP effectively handles the practical challenges involving imperfect pretraining and limited dataset coverage. Our empirical evaluations across multiple benchmarks demonstrate that MSCP significantly outperforms existing approaches, underscoring its proficiency in addressing complex, long-horizon tasks through the strategic use of passive data.

Submission Number: 3117

Loading