Latent Geodesics of Model Dynamics for Offline Reinforcement LearningDownload PDF

12 Oct 2021 (modified: 05 May 2023)Deep RL Workshop NeurIPS 2021Readers: Everyone
Abstract: Model-based offline reinforcement learning approaches generally rely on bounds of model error. While contemporary methods achieve such bounds through an ensemble of models, we propose to estimate them using a data-driven latent metric. Particularly, we build upon recent advances in Riemannian geometry of generative models to construct a latent metric of an encoder-decoder based forward model. Our proposed metric measures both the quality of out of distribution samples as well as the discrepancy of examples in the data. We show that our metric can be viewed as a combination of two metrics, one relating to proximity and the other to epistemic uncertainty. Finally, we leverage our metric in a pessimistic model-based framework, showing a significant improvement upon contemporary model-based offline reinforcement learning benchmarks.
0 Replies