Deep Reinforcement Learning With Adaptive Combined Critics

Huihui Zhang; Wu Huang

Deep Reinforcement Learning With Adaptive Combined Critics

Huihui Zhang, Wu Huang

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: overestimation, continuous control, deep reinforcement learning, policy improvement

Abstract: The overestimation problem has long been popular in deep value learning, because function approximation errors may lead to amplified value estimates and suboptimal policies. There have been several methods to deal with the overestimation problem, however, further problems may be induced, for example, the underestimation bias and instability. In this paper, we focus on the overestimation issues on continuous control through deep reinforcement learning, and propose a novel algorithm that can minimize the overestimation, avoid the underestimation bias and retain the policy improvement during the whole training process. Specifically, we add a weight factor to adjust the influence of two independent critics, and use the combined value of weighted critics to update the policy. Then the updated policy is involved in the update of the weight factor, in which we propose a novel method to provide theoretical and experimental guarantee for future policy improvement. We evaluate our method on a set of classical control tasks, and the results show that the proposed algorithms are more computationally efficient and stable than several existing algorithms for continuous control.

One-sentence Summary: We propose a novel algorithm to tackle overestimation existing in deep reinforcement learning with continuous control to avoild other problems and ensure policy improvement.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Supplementary Material: zip

Reviewed Version (pdf): /references/pdf?id=8LiV3AiveN

5 Replies

Loading