Stationary Deep Reinforcement Learning with Quantum K-spin Hamiltonian Equation

Xiao-Yang Liu; Zechu Li; Shixun Wu; Xiaodong Wang

Stationary Deep Reinforcement Learning with Quantum K-spin Hamiltonian Equation

Xiao-Yang Liu, Zechu Li, Shixun Wu, Xiaodong Wang

16 May 2022 (modified: 05 May 2023)NeurIPS 2022 SubmittedReaders: Everyone

Keywords: deep reinforcement learning, instability, Hamiltonian policy gradient, stationary, quantum K-spin

Abstract: A foundational issue in deep reinforcement learning (DRL) is that \textit{Bellman's optimality equation has multiple fixed points}---failing to return a consistent one. A direct evidence is the instability of existing DRL algorithms, namely, the high variance of cumulative rewards over multiple runs. As a fix of this problem, we propose a quantum K-spin Hamiltonian regularization term (H-term) to help a policy network stably find a \textit{stationary} policy, which represents the lowest energy configuration of a system. First, we make a novel analogy between a Markov Decision Process (MDP) and a \textit{quantum K-spin Ising model} and reformulate the objective function into a quantum K-spin Hamiltonian equation, a functional of policy that measures its energy. Then, we propose a generic actor-critic algorithm that utilizes the H-term to regularize the policy/actor network and provide Hamiltonian policy gradient calculations. Finally, on six challenging MuJoCo tasks over 20 runs, the proposed algorithm reduces the variance of cumulative rewards by $65.2\% \sim 85.6\%$ compared with those of existing algorithms.

TL;DR: Apply a quantum K-spin Hamiltonian equation as a regularier and obtain a new actor-critic algorithm that finds a physically stationary policy.

Supplementary Material: zip

27 Replies

Loading