Hamiltonian Q-Learning: Leveraging Importance-sampling for Data Efficient RLDownload PDF

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone
Keywords: Data efficient RL, $Q$-Learning, Hamiltonian Monte Carlo
Abstract: Model-free reinforcement learning (RL), in particular $Q$-learning is widely used to learn optimal policies for a variety of planning and control problems. However, when the underlying state-transition dynamics are stochastic and high-dimensional, $Q$-learning requires a large amount of data and incurs a prohibitively high computational cost. In this paper, we introduce Hamiltonian $Q$-Learning, a data efficient modification of the $Q$-learning approach, which adopts an importance-sampling based technique for computing the $Q$ function. To exploit stochastic structure of the state-transition dynamics, we employ Hamiltonian Monte Carlo to update $Q$ function estimates by approximating the expected future rewards using $Q$ values associated with a subset of next states. Further, to exploit the latent low-rank structure of the dynamic system, Hamiltonian $Q$-Learning uses a matrix completion algorithm to reconstruct the updated $Q$ function from $Q$ value updates over a much smaller subset of state-action pairs. By providing an efficient way to apply $Q$-learning in stochastic, high-dimensional problems, the proposed approach broadens the scope of RL algorithms for real-world applications, including classical control tasks and environmental monitoring.
One-sentence Summary: We propose a data efficient modification of the $Q$-learning approach which uses Hamiltonian Monte Carlo to compute $Q$ function for problems with stochastic, high-dimensional dynamics.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Reviewed Version (pdf): https://openreview.net/references/pdf?id=U_NsVCNLm
20 Replies

Loading