Efficient Reinforcement Learning by Guiding World Models with Non-Curated Data

Efficient Reinforcement Learning by Guiding World Models with Non-Curated Data

ICLR 2026 Conference Submission12017 Authors

Published: 26 Jan 2026, Last Modified: 06 Feb 2026ICLR 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Reinforcement Learning, Reinforcement Learning from offline data

Abstract: Leveraging offline data is a promising way to improve the sample efficiency of online reinforcement learning (RL). This paper expands the pool of usable data for offline-to-online RL by leveraging abundant non-curated data that is reward-free, of mixed quality, and collected across multiple embodiments. Although learning a world model appears promising for utilizing such data, we find that naive fine-tuning fails to accelerate RL training on many tasks. Through careful investigation, we attribute this failure to the distributional shift between offline and online data during fine-tuning. To address this issue and effectively use the offline data, we propose two techniques: i) experience rehearsal and ii) execution guidance. With these modifications, the non-curated offline data substantially improves RL’s sample efficiency. Under limited sample budgets, our method achieves a 102.8% relative improvement in aggregate score over learning-from-scratch baselines across 72 visuomotor tasks spanning 6 embodiments. On challenging tasks such as locomotion and robotic manipulation, it outperforms prior methods that utilize offline data by a decent margin.

Primary Area: reinforcement learning

Submission Number: 12017

Loading