Double Gumbel Q-Learning

Published: 21 Sept 2023, Last Modified: 13 Jan 2024NeurIPS 2023 spotlightEveryoneRevisionsBibTeX
Keywords: deep reinforcement learning, Q-Learning, TD-Learning with function approximation, extreme value theory, maximum-likelihood estimation, moment-matching
TL;DR: DoubleGum is a well-performing Q-Learning algorithm that models noise with two heteroscedastic Gumbel distributions.
Abstract: We show that Deep Neural Networks introduce two heteroscedastic Gumbel noise sources into Q-Learning. To account for these noise sources, we propose Double Gumbel Q-Learning, a Deep Q-Learning algorithm applicable for both discrete and continuous control. In discrete control, we derive a closed-form expression for the loss function of our algorithm. In continuous control, this loss function is intractable and we therefore derive an approximation with a hyperparameter whose value regulates pessimism in Q-Learning. We present a default value for our pessimism hyperparameter that enables DoubleGum to outperform DDPG, TD3, SAC, XQL, quantile regression, and Mixture-of-Gaussian Critics in aggregate over 33 tasks from DeepMind Control, MuJoCo, MetaWorld, and Box2D and show that tuning this hyperparameter may further improve sample efficiency.
Supplementary Material: pdf
Submission Number: 13491