DreamExplorations: Leveraging Suboptimal Noisy Robot Trajectories in Offline RL

08 Sept 2025 (modified: 24 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Robot Learning, Reinforcement Learning, Embodied AI
Abstract: Exploration is a desirable characteristic in online reinforcement learning, where the online agent can interact with the environment, explore the diverse states, and update the policy. However, since the datasets of offline reinforcement learning are static and the traditional offline RL algorithms always rely on the relatively good quality of demo agents, it is very hard to explore the diversity of state space. In this paper, we have found out that in offline goal-conditioned reinforcement learning (OGCRL), we can theoretically leverage suboptimal/high noisy datasets for state exploration and we have designed a pipeline to use them. In this case, the highly noisy datasets which are always discarded and regarded as useless datasets in previous researches are used as exploration experts to keep improving the performances of offline reinforcement learning as we scale the sizes of suboptimal datasets. Experimental results demonstrate that our method consistently outperforms baselines and significantly improves models trained solely on high-quality data, especially in environments with large state spaces. This work highlights the untapped potential of imperfect data in enhancing the robustness and generalization of offline RL. We will open-source our code after publication.
Primary Area: applications to robotics, autonomy, planning
Submission Number: 2868
Loading