ADDQ: Adaptive distributional double Q-learning

Leif Döring; Benedikt Wille; Maximilian Birr; Mihail Bîrsan; Martin Slowik

ADDQ: Adaptive distributional double Q-learning

Leif Döring, Benedikt Wille, Maximilian Birr, Mihail Bîrsan, Martin Slowik

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: A novel algorithm is introduced that uses distributional RL to locally adjust the overestimation bias in Q-learning.

Abstract: Bias problems in the estimation of Q-values are a well-known obstacle that slows down convergence of Q-learning and actor-critic methods. One of the reasons of the success of modern RL algorithms is partially a direct or indirect overestimation reduction mechanism. We introduce an easy to implement method built on top of distributional reinforcement learning (DRL) algorithms to deal with the overestimation in a locally adaptive way. Our framework ADDQ is simple to implement, existing DRL implementations can be improved with a few lines of code. We provide theoretical backup and experimental results in tabular, Atari, and MuJoCo environments, comparisons with state-of-the-art methods, and a proof of convergence in the tabular case.

Lay Summary: Scientists working on computers that learn by trial and error (reinforcement learning) have long noticed that the computer sometimes estimates the value of actions incorrectly. This mistake—overestimating what an action might earn—can slow down the learning process. Many modern methods already include tricks to reduce these mistakes. The new method proposed builds on an approach called distributional reinforcement learning. In simple terms, this method looks at the range of possible outcomes rather than just one average value. By doing so, it can adjust the predictions more accurately in different situations. The cool part is that this new method is easy to add to existing code implementations—just a few lines of code are needed! We back our idea with theoretical analysis and demonstrate how to integrate it using another technique known as double Q-learning. We tested the approach in various scenarios, including simple, grid-like problems, video games (like Atari), and even in simulated robotics (MuJoCo). In summary, this new method helps computer learning systems make better predictions about the value of their actions, leading to quicker and more reliable learning.

Link To Code: https://github.com/BommeHD/ADDQ

Primary Area: Reinforcement Learning

Keywords: Reinforcement learning, Q-learning, overestimation bias, distributional RL, Atari, MuJoCo

Flagged For Ethics Review: true

Submission Number: 6410

Loading