NeoRL: A Near Real-World Benchmark for Offline Reinforcement LearningDownload PDF

08 Jun 2021 (modified: 24 May 2023)Submitted to NeurIPS 2021 Datasets and Benchmarks Track (Round 1)Readers: Everyone
Keywords: offline reinforcement learning, offline evaluation, benchmarks
TL;DR: This work includes new datasets with near real-world properties and benchmarks of offline RL algorithm from both online evaluation and offline evaluation.
Abstract: Offline reinforcement learning (RL) aims at learning a good policy from a batch of collected data, without extra interactions with the environment during training. However, current offline RL benchmarks commonly have a large reality gap, because they involve large datasets collected by highly exploratory policies, and the trained policy is directly evaluated in the environment. In real-world situations, running an overly exploratory policy is prohibited to ensure system safety, the data is commonly very limited, and a trained policy should be carefully evaluated before deployment. In this paper, we present a Near real-world offline RL benchmark, named NeoRL, which contains datasets from various domains with controlled sizes, and extra test datasets for offline policy evaluation. We evaluate recent SOTA offline RL algorithms on NeoRL, through both online evaluation and purely offline evaluation. The empirical results demonstrate that the tested offline RL algorithms become less competitive to BC on many datasets, and the current offline policy evaluation methods can hardly select truly effective policies. We hope this work will shed some light on future research and draw more attention when deploying RL in real-world systems.
Supplementary Material: zip
URL: https://github.com/polixir/NeoRL
10 Replies

Loading