Gradient-norm Constrained Algorithm on Offline and Online Learning

20 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: Gradient-norm; Constrained; Reinforcement learning; Offline; Online
Abstract: Reinforcement learning (RL) has displayed great potential on both discrete and continuous tasks. However, its applicability in in realistic settings is curbed by the inherent uncertainty in value estimates and policy estimates, especially for continuous control problem. We propose an off-policy actor-critic method for deep reinforcement learning (DRL), where the value and policy are estimated by function approximation. The gradient decent approach to update parameters of multiple neural networks alternatively will inevitably cause the saddle point problem, which will degrade the overall convergence performance of training processes. Besides, off-policy methods can induce distribution mismatch, causing a deadly cycle of overestimation, when the candidate policies are conspicuously different from the policy which produces the data in replay buffer. Therefore, despite enjoying the advantage in sample complexity, the off-policy actor-critic methods is highly sensitive to network initialization, especially in the absence of expert demonstrations. We attempt to tackle these two issues by proposing a novel policy regularization and related value penalty, respectively. The policy regularization makes the training less content with the saddle point which pretends to be an optimal one and encourages the training to skip it. And the value penalty discourages over-optimistic value estimates. The proposed method is further combined with behavior cloning to apply to offline RL and tested on D4RL benchmarks.
Supplementary Material: pdf
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 2893
Loading