Value Shaping: Bias Reduction in Bellman Error for Deep Reinforcement Learning

18 Sept 2025 (modified: 01 Dec 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Bellman error, bias reduction, affine reward transformation
TL;DR: Our goal is to enhance the sample efficiency of deep reinforcement learning algorithms by addressing potential biases in the Bellman error.
Abstract: The Bellman error plays a crucial role as an objective function in deep reinforcement learning (DRL), serving as a proxy for the value error. However, this proxy relationship does not guarantee exact equivalence between the two, as the Bellman error inherently contains bias that can lead to unexpected optimization behavior. In this paper, we investigate the relationship between the value error and the Bellman error, and analyze why the Bellman error is not a reliable proxy due to its inherent bias. Leveraging the linear structure of the Bellman equation, we propose a method to compensate for this bias by adjusting the reward function—while ensuring that such modifications do not alter the optimal policy. In practice, we initialize two parallel Bellman iteration processes: one for estimating the bias and the other for updating the value function with minimal bias. Our method effectively learns a low-bias Q-function, making it broadly applicable and easily integrable into existing mainstream RL algorithms. Experimental results across multiple environments demonstrate that our approach improves RL efficiency, achieves superior performance, and holds promise as a fundamental technique in the field of reinforcement learning.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 10674
Loading