Conservative Reinforcement Learning by Q-function Disagreement

Nitsan Soffair; Orly Avner; Dotan Di Castro

Conservative Reinforcement Learning by Q-function Disagreement

Nitsan Soffair, Orly Avner, Dotan Di Castro

21 Sept 2023 (modified: 12 Feb 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX

Keywords: Reinforcement Learning, Target Network, Regularization

TL;DR: We provide a new regularization term based on std of critic Q-function and show that this regularization leads to an improved performance in many SOTA algorithms

Abstract: In this paper we propose a novel continuous-space RL algorithm that subtracts the Q-target network standard deviation from a Q-target network which leads to forcing a tighter upper-bound on Q-values estimation. We show in experiments that this novel Q-target formula has a performance advantage when applied to algorithms in this space such as TD3, TD7, MaxMin, REDQ, etc., where the domains examined are control tasks from MuJoCo simulation.

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 3825

Loading