VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

Luisa Zintgraf; Kyriacos Shiarlis; Maximilian Igl; Sebastian Schulze; Yarin Gal; Katja Hofmann; Shimon Whiteson

VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

Luisa Zintgraf, Kyriacos Shiarlis, Maximilian Igl, Sebastian Schulze, Yarin Gal, Katja Hofmann, Shimon Whiteson

Published: 20 Dec 2019, Last Modified: 22 Jun 2025ICLR 2020 Conference Blind SubmissionReaders: Everyone

TL;DR: VariBAD opens a path to tractable approximate Bayes-optimal exploration for deep RL using ideas from meta-learning, Bayesian RL, and approximate variational inference.

Abstract: Trading off exploration and exploitation in an unknown environment is key to maximising expected return during learning. A Bayes-optimal policy, which does so optimally, conditions its actions not only on the environment state but on the agent’s uncertainty about the environment. Computing a Bayes-optimal policy is however intractable for all but the smallest tasks. In this paper, we introduce variational Bayes-Adaptive Deep RL (variBAD), a way to meta-learn to perform approximate inference in an unknown environment, and incorporate task uncer- tainty directly during action selection. In a grid-world domain, we illustrate how variBAD performs structured online exploration as a function of task uncertainty. We further evaluate variBAD on MuJoCo domains widely used in meta-RL and show that it achieves higher online return than existing methods.

Keywords: Meta-Learning, Bayesian Reinforcement Learning, BAMDPs, Deep Reinforcement Learning

Code: [![github](/images/github_icon.svg) lmzintgraf/varibad](https://github.com/lmzintgraf/varibad) + [![Papers with Code](/images/pwc_icon.svg) 2 community implementations](https://paperswithcode.com/paper/?openreview=Hkl9JlBYvr)

Data: [MuJoCo](https://paperswithcode.com/dataset/mujoco)

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/varibad-a-very-good-method-for-bayes-adaptive/code)

Original Pdf: pdf

13 Replies

Loading