Keywords: Reinforcement Learning, Offline-to-Online Reinforcement Learning, Offline Unsupervised Reinforcement Learning
Abstract: Offline-to-online reinforcement learning (RL), a framework that trains a policy with offline RL and then further fine-tunes it with online RL,
has been considered a promising recipe for data-driven decision-making. While sensible, this framework has drawbacks: it requires domain-specific offline RL pre-training for each task, and is often brittle in practice. In this work, we propose unsupervised-to-online RL (U2O RL),
which replaces domain-specific supervised offline RL with unsupervised offline RL,
as a better alternative to offline-to-online RL.
U2O RL not only enables reusing a single pre-trained model for multiple downstream tasks,
but also learns better representations, which often result in even better performance and stability
than supervised offline-to-online RL.
To instantiate U2O RL in practice, we propose a general recipe for U2O RL
to bridge task-agnostic unsupervised offline skill-based policy pre-training and supervised online fine-tuning.
Throughout our experiments in nine state-based and pixel-based environments,
we empirically demonstrate that U2O RL achieves strong performance
that matches or even outperforms previous offline-to-online RL approaches,
while being able to reuse a single pre-trained model for a number of different downstream tasks.
Supplementary Material: zip
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 5905
Loading