Abstract: Learning with simulators is ubiquitous in
modern reinforcement learning (RL). The simulator can either correspond to a simplified
version of the real environment (such as a
physics simulation of a robot arm) or to the
environment itself (such as in games like Atari
and Go). Among algorithms that are provably
sample-efficient in this setting, most make the
unrealistic assumption that all possible environment states are known before learning
begins, or perform global optimistic planning
which is computationally inefficient. In this
work, we focus on simulation-based RL under
a more realistic local access protocol, where
the state space is unknown and the simulator can only be queried at states that have
previously been observed (initial states and
those returned by previous queries). We propose an algorithm named Confident-LSVI
based on the template of least-square value
iteration. Confident-LSVI incrementally
builds a coreset of important states and uses
the simulator to revisit them. Assuming that
the linear function class has low approximation error under the Bellman optimality operator (a.k.a. low inherent Bellman error), we
bound the algorithm performance in terms
of this error, and show that it is query- and
computationally-efficient.
0 Replies
Loading