Off-policy Reinforcement Learning with Model-based Exploration Augmentation

Likun Wang; Xiangteng Zhang; Yinuo Wang; Guojian Zhan; Wenxuan Wang; Haoyu Gao; Jingliang Duan; Shengbo Eben Li

Off-policy Reinforcement Learning with Model-based Exploration Augmentation

Likun Wang, Xiangteng Zhang, Yinuo Wang, Guojian Zhan, Wenxuan Wang, Haoyu Gao, Jingliang Duan, Shengbo Eben Li

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY-NC-ND 4.0

Keywords: Model-based reinforcement learning, Model-based Exploration, Generative model, World model

TL;DR: We propose MoGE, which enhances the Off-policy RL exploration by critical experiences generaion, leading to significant improvements in sample efficiency and performance ceilings across various tasks.

Abstract: Exploration is crucial in Reinforcement Learning (RL) as it enables the agent to understand the environment for better decision-making. Existing exploration methods fall into two paradigms: active exploration, which injects stochasticity into the policy but struggles in high-dimensional environments, and passive exploration, which manages the replay buffer to prioritize under-explored regions but lacks sample diversity. To address the limitation in passive exploration, we propose Modelic Generative Exploration (MoGE), which augments exploration through the generation of under-explored critical states and synthesis of dynamics-consistent experiences. MoGE consists of two components: (1) a diffusion generator for critical states under the guidance of entropy and TD error, and (2) a one-step imagination world model for constructing critical transitions for agent learning. Our method is simple to implement and seamlessly integrates with mainstream off-policy RL algorithms without structural modifications. Experiments on OpenAI Gym and DeepMind Control Suite demonstrate that MoGE, as an exploration augmentation, significantly enhances efficiency and performance in complex tasks.

Primary Area: Reinforcement learning (e.g., decision and control, planning, hierarchical RL, robotics)

Submission Number: 26609

Loading