Stationary Deep Reinforcement Learning with Quantum K-spin Hamiltonian Equation

Xiao-Yang Liu; Zechu Li; Shixun Wu; Xiaodong Wang

Stationary Deep Reinforcement Learning with Quantum K-spin Hamiltonian Equation

Xiao-Yang Liu, Zechu Li, Shixun Wu, Xiaodong Wang

Published: 01 Feb 2023, Last Modified: 13 Feb 2023Submitted to ICLR 2023Readers: Everyone

Abstract: Instability is a major issue of deep reinforcement learning (DRL) algorithms --- high variance of cumulative rewards over multiple runs. The instability is mainly caused by the existence of \textit{many local minimas} and worsened by the \textit{multiple fixed points} issue of Bellman's optimality equation. As a fix, we propose a quantum K-spin Hamiltonian regularization term (called \textit{H-term}) to help a policy network converge to a high-quality local minima. First, we take a quantum perspective by modeling a policy as a \textit{K-spin Ising model} and employ a Hamiltonian equation to measure the \textit{energy} of a policy. Then, we derive a novel Hamiltonian policy gradient theorem and design a generic actor-critic algorithm that utilizes the H-term to regularize the policy network. Finally, the proposed method significantly reduces the variance of cumulative rewards by $65.2\% \sim 85.6\%$ on six MuJoCo tasks; achieves an approximation ratio $\leq 1.05$ over $90\%$ test cases and reduces its variance by $60.16\% \sim 94.52\%$ on two combinatorial optimization tasks and two non-convex optimization tasks, compared with those of existing algorithms over $20$ runs, respectively.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Reinforcement Learning (eg, decision and control, planning, hierarchical RL, robotics)

Supplementary Material: zip

12 Replies

Loading