Efficient Inference and Exploration for Reinforcement LearningDownload PDF

25 Sept 2019 (modified: 05 May 2023)ICLR 2020 Conference Blind SubmissionReaders: Everyone
Keywords: Reinforcement Learning, Efficient Exploration, Asymptotic Analysis, Statistical Inference
TL;DR: We investigate the large-sample behaviors of the Q-value estimates and proposed an efficient exploration strategy that relies on estimating the relative discrepancies among the Q estimates.
Abstract: Despite an ever growing literature on reinforcement learning algorithms and applications, much less is known about their statistical inference. In this paper, we investigate the large-sample behaviors of the Q-value estimates with closed-form characterizations of the asymptotic variances. This allows us to efficiently construct confidence regions for Q-value and optimal value functions, and to develop policies to minimize their estimation errors. This also leads to a policy exploration strategy that relies on estimating the relative discrepancies among the Q estimates. Numerical experiments show superior performances of our exploration strategy than other benchmark approaches.
Original Pdf: pdf
8 Replies

Loading