Leveraging Suboptimal and Noisy Trajectories for Goal-Conditional Offline RL

Published: 05 Mar 2026, Last Modified: 14 Mar 2026ICLR 2026 Workshop RSI PosterEveryoneRevisionsCC BY 4.0
Keywords: Robot Learning, Self-Improving AI, Reincorcement Learning
Abstract: Exploration is a key capability of online reinforcement learning (RL), where agents interact with the environment to discover diverse trajectories and improve policies. In contrast, offline RL relies on static datasets that typically consist of high-quality demonstrations, limiting state-space exploration. As a result, suboptimal or highly noisy trajectories are often discarded as harmful to learning. In this paper, we show that in offline goal-conditioned reinforcement learning (OGCRL), such imperfect trajectories can instead serve as a valuable source of exploration. We theoretically analyze how suboptimal and noisy trajectories expand state-space coverage and propose a learning pipeline that leverages them as exploration experts while preserving policy learning from high-quality demonstrations. Experiments show that incorporating large-scale noisy trajectories consistently outperforms baselines and improves models trained solely on expert data, especially in environments with large and complex state spaces. Our findings reveal the untapped potential of imperfect trajectories in offline RL and suggest a scalable way where increasingly diverse datasets drive policy improvement.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 67
Loading