NeoRL: A Near Real-World Benchmark for Offline Reinforcement Learning

Rong-Jun Qin; Xingyuan Zhang; Songyi Gao; Xiong-Hui Chen; Zewen Li; Weinan Zhang; Yang Yu

NeoRL: A Near Real-World Benchmark for Offline Reinforcement Learning

Rong-Jun Qin, Xingyuan Zhang, Songyi Gao, Xiong-Hui Chen, Zewen Li, Weinan Zhang, Yang Yu

Published: 17 Sept 2022, Last Modified: 20 Apr 2025NeurIPS 2022 Datasets and Benchmarks Readers: Everyone

Keywords: offline reinforcement learning, conservative datasets, offline policy validation, benchmarks

TL;DR: NeoRL presents conservative datasets for offline RL, highlights the complete pipeline for deploying offline RL in real-world applications, and also benchmarks recent offline RL algorithms on NeoRL under the complete pipeline.

Abstract: Offline reinforcement learning (RL) aims at learning effective policies from historical data without extra environment interactions. During our experience of applying offline RL, we noticed that previous offline RL benchmarks commonly involve significant reality gaps, which we have identified include rich and overly exploratory datasets, degraded baseline, and missing policy validation. In many real-world situations, to ensure system safety, running an overly exploratory policy to collect various data is prohibited, thus only a narrow data distribution is available. The resulting policy is regarded as effective if it is better than the working behavior policy; the policy model can be deployed only if it has been well validated, rather than accomplished the training. In this paper, we present a Near real-world offline RL benchmark, named NeoRL, to reflect these properties. NeoRL datasets are collected with a more conservative strategy. Moreover, NeoRL contains the offline training and offline validation pipeline before the online test, corresponding to real-world situations. We then evaluate recent state-of-the-art offline RL algorithms in NeoRL. The empirical results demonstrate that some offline RL algorithms are less competitive to the behavior cloning and the deterministic behavior policy, implying that they could be less effective in real-world tasks than in the previous benchmarks. We also disclose that current offline policy evaluation methods could hardly select the best policy. We hope this work will shed some light on future research and deploying RL in real-world systems.

URL: https://github.com/polixir/NeoRL

Dataset Url: https://github.com/polixir/NeoRL

License: All datasets are licensed under the [Creative Commons Attribution 4.0 License (CC BY)](https://creativecommons.org/licenses/by/4.0/), and code is licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0.html).

Author Statement: Yes

Supplementary Material: zip

Contribution Process Agreement: Yes

In Person Attendance: Yes

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 3 code implementations](https://www.catalyzex.com/paper/neorl-a-near-real-world-benchmark-for-offline/code)

13 Replies

Loading