An Optimal Tightness Bound for the Simulation Lemma

Published: 15 May 2024, Last Modified: 14 Nov 2024RLC 2024EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Reinforcement Learning, Simulation, Model-Based Reinforcement Learning, Tabular Reinforcement Learning
TL;DR: We improve the bound presented in the classic simulation lemma, and prove it is tight.
Abstract: We present a bound for value-prediction error with respect to model misspecification that is tight, including constant factors. This is a direct improvement of the ``simulation lemma,’’ a foundational result in reinforcement learning. We demonstrate that existing bounds are quite loose, becoming vacuous for large discount factors, due to the suboptimal treatment of compounding probability errors. By carefully considering this quantity on its own, instead of as a subcomponent of value error, we derive a bound that is sub-linear with respect to transition function misspecification. We then demonstrate broader applicability of this technique, improving a similar bound in the related subfield of hierarchical abstraction.
Submission Number: 106
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview