TL;DR: Offline-to-online RL often struggles with fine-tuning the offline pre-trained agents, we propose OPT, a novel pre-training phase that mitigates this issue and demonstrates strong performance.
Abstract: Offline-to-online reinforcement learning (RL) aims to integrate the complementary strengths of offline and online RL by pre-training an agent offline and subsequently fine-tuning it through online interactions. However, recent studies reveal that offline pre-trained agents often underperform during online fine-tuning due to inaccurate value estimation caused by distribution shift, with random initialization proving more effective in certain cases. In this work, we propose a novel method, Online Pre-Training for Offline-to-Online RL (OPT), explicitly designed to address the issue of inaccurate value estimation in offline pre-trained agents. OPT introduces a new learning phase, Online Pre-Training, which allows the training of a new value function tailored specifically for effective online fine-tuning. Implementation of OPT on TD3 and SPOT demonstrates an average 30\% improvement in performance across a wide range of D4RL environments, including MuJoCo, Antmaze, and Adroit.
Lay Summary: Training AI agents using pre-collected data is efficient but often leads to poor performance when those agents are later deployed in the real world. This happens because the data the agent saw during training may differ from what it encounters during deployment, a problem known as distribution shift. In some cases, even starting from scratch performs better than starting from a poorly pre-trained agent.
To address this, we introduce Online Pre-Training for Offline-to-Online RL (OPT), a new approach that gives the agent a chance to adapt before full online learning begins. OPT adds a brief intermediate phase where the agent learns a new decision-making component using a small amount of interaction with the real environment. This prepares the agent to fine-tune more effectively when online learning starts.
We applied OPT to several common reinforcement learning tasks and found that it consistently improves performance, even outperforming prior state-of-the-art methods. Because OPT can be added to many existing algorithms, it provides a simple and effective way to make AI agents more reliable in dynamic, real-world settings.
Primary Area: Reinforcement Learning
Keywords: Reinforcement Learning, Offline-to-Online Reinforcement Learning, Online Pre-Training, Online Fine-Tuning
Submission Number: 10143
Loading