Keywords: Reinforcement Learning, Representation Learning, Latent Dynamics, Function Approximation
TL;DR: We study the statistical requirements and algorithmic principles for reinforcement learning under general latent dynamics
Abstract: Real-world applications of reinforcement learning often involve environments where agents operate on complex, high-dimensional observations, but the underlying (``latent'') dynamics are comparatively simple. However, beyond restrictive settings
such as tabular latent dynamics, the fundamental statistical requirements and algorithmic principles for *reinforcement learning under latent dynamics* are poorly
understood.
This paper addresses the question of reinforcement learning under *general latent dynamics* from a
statistical and algorithmic perspective. On the statistical side, our main negative
result shows that *most* well-studied settings for reinforcement learning with function approximation become intractable when composed with rich observations; we complement this with a positive result, identifying *latent pushforward coverability* as a
general condition that enables statistical tractability. Algorithmically, we develop provably efficient *observable-to-latent* reductions ---that is, reductions that transform an arbitrary algorithm for the
latent MDP into an algorithm that can operate on rich observations--- in two settings: one where the agent has access to hindsight
observations of the latent dynamics (Lee et al., 2023) and one
where the agent can estimate *self-predictive* latent models (Schwarzer et al., 2020). Together, our results serve as a
first step toward a unified statistical and algorithmic theory for
reinforcement learning under latent dynamics.
Primary Area: Reinforcement learning
Submission Number: 16721
Loading