Less Suboptimal Learning and Control in Variational POMDPsDownload PDF

Mar 09, 2021 (edited Apr 16, 2021)ICLR 2021 Workshop SSL-RL Blind SubmissionReaders: Everyone
  • Keywords: variational inference, reinforcement learning, optimal control, state-space model, pomdp
  • TL;DR: A recently uncovered suboptimality in variational state-space models impacts control in partially-observable environments.
  • Abstract: A recently uncovered pitfall in learning generative models with amortised variational inference, the conditioning gap, questions common practices in model-based reinforcement learning. Withholding a part of the quantities that the true posterior depends on from the inference network leads to a biased generative model and an approximate posterior that underestimates uncertainty. We examine the effect of the conditioning gap on model-based reinforcement learning with variational world models. We study the effect in three settings with known dynamics, which enables us to compare to a near-optimal policy. Our finding is that the impact of the conditioning gap becomes severe in systems where the state is hard to estimate.
0 Replies