Taming Policy Constrained Offline Reinforcement Learning for Non-expert Demonstrations

Chengqian Gao; Ke Xu; Liu Liu; Deheng Ye; Peilin Zhao; zhiqiang xu

Taming Policy Constrained Offline Reinforcement Learning for Non-expert Demonstrations

Chengqian Gao, Ke Xu, Liu Liu, Deheng Ye, Peilin Zhao, zhiqiang xu

22 Sept 2022 (modified: 13 Feb 2023)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone

Keywords: Contaminated Datasets, Robust Offline Reinforcement Learning

TL;DR: The performance losses of policy constraint-based offline RL algorithms on contaminated datasets can be alleviated by gradient penalty and constraint relaxation.

Abstract: A promising paradigm for offline reinforcement learning (RL) is to constrain the learned policy to stay close to the dataset behaviors, known as policy constraint offline RL. However, existing works heavily rely on the purity of the data, exhibiting performance degradation or even catastrophic failure when learning from contaminated datasets containing impure trajectories of diverse levels. e.g., expert level, medium level, etc., while offline contaminated data logs exist commonly in the real world. To mitigate this, we first introduce gradient penalty over the learned value function to tackle the exploding Q-function gradients. We then relax the closeness constraints towards non-optimal actions with critic weighted constraint relaxation. Experimental results show that the proposed techniques effectively tame the non-optimal trajectories for policy constraint offline RL methods, evaluated on a set of contaminated D4RL Mujoco and Adroit datasets.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Reinforcement Learning (eg, decision and control, planning, hierarchical RL, robotics)

Supplementary Material: zip

13 Replies

Loading