Bayesian Ensembles for Exploration in Deep Q-Learning

Pascal Van der Vaart; Neil Yorke-Smith; Matthijs T. J. Spaan

Bayesian Ensembles for Exploration in Deep Q-Learning

Pascal Van der Vaart, Neil Yorke-Smith, Matthijs T. J. Spaan

Published: 13 Mar 2024, Last Modified: 22 Apr 2024ALA 2024EveryoneRevisionsBibTeXCC BY 4.0

Keywords: reinforcement learning, exploration, ensembles, Bayesian

TL;DR: Use Sequential Monte Carlo to approximately sample from the posterior distribution over Bayesian Q-networks.

Abstract: Exploration in reinforcement learning remains a difficult challenge. In order to drive exploration, ensembles with randomized prior functions have recently been popularized to quantify uncertainty in the value model. However these ensembles have no theoretical motivation why they should resemble the actual posterior. In this work, we view training ensembles from the perspective of Sequential Monte Carlo, a Monte Carlo method that approximates a sequence of distributions with a set of particles, and propose an algorithm that exploits both the practical flexibility of ensembles and theory of the Bayesian paradigm. We incorporate this method into a standard DQN agent and experimentally show qualitatively good uncertainty quantification and improved exploration capabilities over a regular ensemble.

Supplementary Material: pdf

Type Of Paper: Full paper (max page 8)

Anonymous Submission: Anonymized submission.

Submission Number: 7

Loading