Abstract: Value function learning plays a central role in many state-of-the-art reinforcement learning algorithms. However, many standard algorithms like Q-learning lose convergence guarantees when function approximation is used, as is often observed in practice. In this paper, we propose a novel loss function, the minimization of which results in the true value function. The key advantage of this new loss is that its gradient can be easily approximated by using sampled transitions, avoiding the double-sample issue faced by prior algorithms like residual gradient. In practice, our approach may be combined with general (differentiable) function classes such as neural networks, and is shown to work reliably and effectively in several benchmarks.
Code Link: https://github.com/lewisKit/Kernel-Bellman-Loss
CMT Num: 8945
2 Replies
Loading