Keywords: variational inference, reinforcement learning, optimal control, state-space model, pomdp
TL;DR: A recently uncovered suboptimality in variational state-space models impacts control in partially-observable environments.
Abstract: A recently uncovered pitfall in learning generative models with amortised variational inference, the conditioning gap, questions common practices in model-based reinforcement learning. Withholding a part of the quantities that the true posterior depends on from the inference network leads to a biased generative model and an approximate posterior that underestimates uncertainty. We examine the effect of the conditioning gap on model-based reinforcement learning with variational world models. We study the effect in three settings with known dynamics, which enables us to compare to a near-optimal policy. Our finding is that the impact of the conditioning gap becomes severe in systems where the state is hard to estimate.