Generalist Policy for k-Server Problem on Graphs using Deep Reinforcement Learning with Action-Value Decomposition

Iliyas Bektas; Ayan Mukhopadhyay; Aron Laszka

Generalist Policy for k-Server Problem on Graphs using Deep Reinforcement Learning with Action-Value Decomposition

Iliyas Bektas, Ayan Mukhopadhyay, Aron Laszka

28 Sept 2024 (modified: 03 Dec 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: reinforcement learning, graph neural network, k-server problem, deep learning, graph convolution, transportation

TL;DR: We introduce a novel architecture for the scalable training of generalist policies for the online k-server problem, a fundamental computational problem in graph theory with various applications in the transportation domain.

Abstract: The online $k$-server problem on graphs is a fundamental computational problem that can model a wide range of practical problems, such as dispatching ambulances to serve accidents or dispatching taxis to serve ride requests. While most prior work on the $k$-server problem focused on online algorithms, reinforcement learning promises policies that require low computational effort during execution, which is critical in time-sensitive applications, such as ambulance dispatch. However, there exists no scalable reinforcement-learning approach for the $k$-server problem. To address this gap, we introduce a scalable computational approach for learning generalist policies. Besides scalability, the advantage of generalist policies is transferability: a generalist policy can be applied to an entire class of graphs without the need for retraining, which is crucial for practical applications, e.g., in ambulance dispatch problems where road conditions or demand distributions may change over time. We achieve scalability and transferability by introducing a novel architecture that decomposes the action-value into a global and a local term, estimated from a shared graph-convolution backbone. We evaluate our approach on a variety of graph classes, comparing to well-established baselines, demonstrating the performance and transferability of our generalist policies.

Supplementary Material: zip

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 13055

Loading