2019 (modified: 11 Nov 2022)ICML 2019Readers: Everyone
Abstract:Consider a Markov decision process (MDP) that admits a set of state-action features, which can linearly express the process’s probabilistic transition model. We propose a parametric Q-learning algo...