- Keywords: Reinforcement Learning, Batch RL, Online RL, Offline RL
- Abstract: Batch RL has seen a surge in popularity and is applicable in many practical scenarios where past data is available. Unfortunately, the performance of batch RL agents is limited in both theory and practice without strong assumptions on the data-collection process e.g. sufficient coverage or a good policy. To enable better performance, we investigate the offline-online setting: The agent has access to a batch of data to train on but is also allowed to learn during the evaluation phase in an online manner. This is an extension to batch RL, allowing the agent to adapt to new situations without having to precommit to a policy. In our experiments, we find that agents trained in an offline-online manner can outperform agents trained only offline or online, sometimes by a large margin, for different dataset sizes and data-collection policies. Furthermore, we investigate the use of optimism vs. pessimism for value functions in the offline-online setting due to their use in batch and online RL.
- One-sentence Summary: We explore a novel setting combining batch and online RL.