Exponential Family Model-Based Reinforcement Learning via Score Matching

Gene Li; Junbo Li; Nathan Srebro; Zhaoran Wang; Zhuoran Yang

Exponential Family Model-Based Reinforcement Learning via Score Matching

Gene Li, Junbo Li, Nathan Srebro, Zhaoran Wang, Zhuoran Yang

12 Oct 2021 (modified: 05 May 2023)Deep RL Workshop NeurIPS 2021Readers: Everyone

Keywords: exploration, optimization, model-based RL

TL;DR: We propose a model-based RL algorithm for exponential family transitions that uses score matching.

Abstract: We propose a optimistic model-based algorithm, dubbed SMRL, for finite-horizon episodic reinforcement learning (RL) when the transition model is specified by exponential family distributions with $d$ parameters and the reward is bounded and known. SMRL uses score matching, an unnormalized density estimation technique that enables efficient estimation of the model parameter by ridge regression. SMRL achieves $\tilde O(d\sqrt{H^3T})$ regret, where $H$ is the length of each episode and $T$ is the total number of interactions.

0 Replies

Loading