Lightweight Uncertainty for Offline Reinforcement Learning via Bayesian Posterior

Xudong Yu; Chenjia Bai; Hongyi Guo; Lingxiao Wang; Changhong Wang; Zhen Wang; Zhaoran Wang

Lightweight Uncertainty for Offline Reinforcement Learning via Bayesian Posterior

Xudong Yu, Chenjia Bai, Hongyi Guo, Lingxiao Wang, Changhong Wang, Zhen Wang, Zhaoran Wang

22 Sept 2022 (modified: 13 Feb 2023)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone

Keywords: Offline reinforcement learning, Uncertainty quantification, Bayesian neural networks

Abstract: Offline Reinforcement Learning (RL) aims to learn optimal policies from fixed datasets. Directly applying off-policy RL algorithms to offline datasets typically suffers from the distributional shift issue and fails to obtain a reliable value estimation for out-of-distribution (OOD) actions. To this end, several methods penalize the value function with uncertainty quantification and achieve tremendous success from both theoretical and empirical perspectives. However, such uncertainty-based methods typically require estimating the lower confidence bound (LCB) of the $Q$-function based on a large number of ensemble networks, which is computationally expensive. In this paper, we propose a lightweight uncertainty quantifier based on approximate Bayesian inference in the last layer of the $Q$-network, which estimates the Bayesian posterior with minimal parameters in addition to the ordinary $Q$-network. We then obtain the uncertainty quantification by the disagreement of the $Q$-posterior. Moreover, to avoid mode collapse in OOD samples and improve diversity in the $Q$-posterior, we introduce a repulsive force for OOD predictions in training. We show that our method recovers the provably efficient LCB-penalty under linear MDP assumptions. We further compare our method with other baselines on the D4RL benchmark. The experimental results show that our proposed method achieves state-of-the-art performance on most tasks with more lightweight uncertainty quantifiers.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Reinforcement Learning (eg, decision and control, planning, hierarchical RL, robotics)

19 Replies

Loading