Efficient Inference and Exploration for Reinforcement Learning

Anonymous

24 Sept 2019 (modified: 24 Sept 2019)OpenReview Anonymous Preprint Blind SubmissionReaders: Everyone

Abstract: Despite an ever growing literature on reinforcement learning algorithms and applications, much less is known about their statistical inference. In this paper, we investigate the large-sample behaviors of the Q-value estimates with closed-form characterizations of the asymptotic variances. This allows us to efficiently construct confidence regions for Q-value and optimal value functions, and to develop policies to minimize their estimation errors. This also leads to a policy exploration strategy that relies on estimating the relative discrepancies among the Q estimates. Numerical experiments show superior performances of our exploration strategy than other benchmark approaches.

Keywords: Reinforcement Learning, Efficient Exploration, Asymptotic Analysis, Statistical Inference.

TL;DR: We investigate the large-sample behaviors of the Q-value estimates and proposed an efficient exploration strategy that relies on estimating the relative discrepancies among the Q estimates.

0 Replies