Towards Hyperparameter-free Policy Selection for Offline Reinforcement Learning

Siyuan Zhang; Nan Jiang

Towards Hyperparameter-free Policy Selection for Offline Reinforcement Learning

Siyuan Zhang, Nan Jiang

Published: 09 Nov 2021, Last Modified: 05 May 2023NeurIPS 2021 PosterReaders: Everyone

Keywords: reinforcement learning, offline RL, policy selection, hyperparameter-free

TL;DR: We propose hyperparameter-free policy-selection algorithms for offline RL.

Abstract: How to select between policies and value functions produced by different training algorithms in offline reinforcement learning (RL)---which is crucial for hyperparameter tuning---is an important open question. Existing approaches based on off-policy evaluation (OPE) often require additional function approximation and hence hyperparameters, creating a chicken-and-egg situation. In this paper, we design hyperparameter-free algorithms for policy selection based on BVFT [XJ21], a recent theoretical advance in value-function selection, and demonstrate their effectiveness in discrete-action benchmarks such as Atari. To address performance degradation due to poor critics in continuous-action domains, we further combine BVFT with OPE to get the best of both worlds, and obtain a hyperparameter-tuning method for $Q$-function based OPE with theoretical guarantees as a side product.

Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.

Supplementary Material: pdf

Code: https://github.com/jasonzhang929/BVFT_empirical_experiments

19 Replies

Loading