Putting the Spotlight on the Initial State Distribution

Published: 01 Jul 2025, Last Modified: 23 Jul 2025Finding the Frame (RLC 2025)EveryoneRevisionsBibTeXCC BY 4.0
Keywords: reinforcement learning, initial state distribution, resets, local planning, performance difference lemma
Abstract: The initial state distribution in reinforcement learning (RL) is often treated as a technical detail, overshadowed by the focus on policy optimization and value function approximation. This paper challenges this perspective by providing a rigorous analysis and intuition of how the initial state distribution affects the objective function of RL. We derive performance difference lemmas that quantify how changes in the initial distribution propagate through the learning objective, revealing bounds that scale with $\frac{1}{1-\gamma}$ in the infinite horizon setting and $(T+1)$ in the finite horizon case, where $\gamma$ is the discount factor and $T$ is the horizon length. These lemmas and an illustrative example, demonstrate that seemingly minor changes in where an agent begins can lead to dramatically different outcomes—even when following the same policy. These results have immediate implications for practical RL deployments where the training and testing distributions often differ, and provide an alternate theoretical perspective for recent advances in reverse curriculum learning and local planning algorithms.
Submission Number: 21
Loading