Stationary Deep Reinforcement Learning with Quantum K-spin Hamiltonian EquationDownload PDF


22 Sept 2022, 12:35 (modified: 26 Oct 2022, 14:07)ICLR 2023 Conference Blind SubmissionReaders: Everyone
Abstract: Instability is a major issue of deep reinforcement learning (DRL) algorithms --- high variance of cumulative rewards over multiple runs. The instability is mainly caused by the existence of \textit{many local minimas} and worsened by the \textit{multiple fixed points} issue of Bellman's optimality equation. As a fix, we propose a quantum K-spin Hamiltonian regularization term (called \textit{H-term}) to help a policy network converge to a high-quality local minima. First, we take a quantum perspective by modeling a policy as a \textit{K-spin Ising model} and employ a Hamiltonian equation to measure the \textit{energy} of a policy. Then, we derive a novel Hamiltonian policy gradient theorem and design a generic actor-critic algorithm that utilizes the H-term to regularize the policy network. Finally, the proposed method significantly reduces the variance of cumulative rewards by $65.2\% \sim 85.6\%$ on six MuJoCo tasks; achieves an approximation ratio $\leq 1.05$ over $90\%$ test cases and reduces its variance by $60.16\% \sim 94.52\%$ on two combinatorial optimization tasks and two non-convex optimization tasks, compared with those of existing algorithms over $20$ runs, respectively.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Supplementary Material: zip
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Reinforcement Learning (eg, decision and control, planning, hierarchical RL, robotics)
11 Replies