Multi-Bellman operator for convergence of $Q$-learning with linear function approximation

TMLR Paper3697 Authors

15 Nov 2024 (modified: 21 Nov 2024)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: We investigate the convergence of $Q$-learning with linear function approximation and introduce the multi-Bellman operator, an extension of the traditional Bellman operator. By analyzing the properties of this operator, we identify conditions under which the projected multi-Bellman operator becomes a contraction, yielding stronger fixed-point guarantees compared to the original Bellman operator. Building on these insights, we propose the multi-$Q$-learning algorithm, which achieves convergence and approximates the optimal solution with arbitrary precision. This contrasts with traditional $Q$-learning, which lacks such convergence guarantees. Finally, we empirically validate our theoretical results.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Mohammad_Emtiyaz_Khan1
Submission Number: 3697
Loading