Uncertainty Weighted Offline Reinforcement Learning

Yue Wu; Shuangfei Zhai; Nitish Srivastava; Joshua M. Susskind; Jian Zhang; Ruslan Salakhutdinov; Hanlin Goh

Uncertainty Weighted Offline Reinforcement Learning

Yue Wu, Shuangfei Zhai, Nitish Srivastava, Joshua M. Susskind, Jian Zhang, Ruslan Salakhutdinov, Hanlin Goh

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: reinforcement learning, offline, batch reinforcement learning, off-policy, uncertainty estimation, dropout, actor-critic, bootstrap error

Abstract: Offline Reinforcement Learning promises to learn effective policies from previously-collected, static datasets without the need for exploration. However, existing Q-learning and actor-critic based off-policy RL algorithms fail when bootstrapping from out-of-distribution (OOD) actions or states. We hypothesize that a key missing ingredient from the existing methods is a proper treatment of uncertainty in the offline setting. We propose Uncertainty Weighted Actor-Critic (UWAC), an algorithm that models the epistemic uncertainty to detect OOD state-action pairs and down-weights their contribution in the training objectives accordingly. Implementation-wise, we adopt a practical and effective dropout-based uncertainty estimation method that introduces very little overhead over existing RL algorithms. Empirically, we observe that UWAC substantially improves model stability during training. In addition, UWAC out-performs existing offline RL methods on a variety of competitive tasks, and achieves significant performance gains over the state-of-the-art baseline on datasets with sparse demonstrations collected from human experts.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

One-sentence Summary: A simple and effective uncertainty weighted training mechanism for stabilizing offline reinforcement learning.

Supplementary Material: zip

Reviewed Version (pdf): https://openreview.net/references/pdf?id=-YOShKy4m

14 Replies

Loading