Multi-Bellman operator for convergence of $Q$-learning with linear function approximation

Published: 15 Apr 2025, Last Modified: 15 Apr 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: We investigate the convergence of $Q$-learning with linear function approximation and introduce the multi-Bellman operator, an extension of the traditional Bellman operator. By analyzing the properties of this operator, we identify conditions under which the projected multi-Bellman operator becomes a contraction, yielding stronger fixed-point guarantees compared to the original Bellman operator. Building on these insights, we propose the multi-$Q$-learning algorithm, which achieves convergence and approximates the optimal solution with arbitrary precision. This contrasts with traditional $Q$-learning, which lacks such convergence guarantees. Finally, we empirically validate our theoretical results.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: After the acceptance notification, on top of the modifications made to the revised paper (after reviews), we made the following changes for the camera-ready version, in accordance to the suggestions from the reviewers and the action editor (AE): - Rephrased a sentence in the second paragraph of Section 7.1, to explicitly mention the exponential complexity of the proposed method (suggested by the AE); - Included a last sentence in the third paragraph of Section 7.1, to suggest as research direction the use of our method in real-world applications such as the training of LLMs (suggested by the AE in response to a comment from a reviewer); - Included subscripts $s, a$ to the rewards $r$ in Algorithm 2, for clarity (suggested by a reviewer).
Assigned Action Editor: ~Mohammad_Emtiyaz_Khan1
Submission Number: 3697
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview