Less Suboptimal Learning and Control in Variational POMDPs

Anonymous

Less Suboptimal Learning and Control in Variational POMDPs

Anonymous

Published: 15 Jun 2022, Last Modified: 05 May 2023SSL-RL 2021 PosterReaders: Everyone

Keywords: variational inference, reinforcement learning, optimal control, state-space model, pomdp

TL;DR: A recently uncovered suboptimality in variational state-space models impacts control in partially-observable environments.

Abstract: A recently uncovered pitfall in learning generative models with amortised variational inference, the conditioning gap, questions common practices in model-based reinforcement learning. Withholding a part of the quantities that the true posterior depends on from the inference network leads to a biased generative model and an approximate posterior that underestimates uncertainty. We examine the effect of the conditioning gap on model-based reinforcement learning with variational world models. We study the effect in three settings with known dynamics, which enables us to compare to a near-optimal policy. Our finding is that the impact of the conditioning gap becomes severe in systems where the state is hard to estimate.

0 Replies

Loading