everyone
since 04 Oct 2024">EveryoneRevisionsBibTeXCC BY 4.0
The online $k$-server problem on graphs is a fundamental computational problem that can model a wide range of practical problems, such as dispatching ambulances to serve accidents or dispatching taxis to serve ride requests. While most prior work on the $k$-server problem focused on online algorithms, reinforcement learning promises policies that require low computational effort during execution, which is critical in time-sensitive applications, such as ambulance dispatch. However, there exists no scalable reinforcement-learning approach for the $k$-server problem. To address this gap, we introduce a scalable computational approach for learning generalist policies. Besides scalability, the advantage of generalist policies is transferability: a generalist policy can be applied to an entire class of graphs without the need for retraining, which is crucial for practical applications, e.g., in ambulance dispatch problems where road conditions or demand distributions may change over time. We achieve scalability and transferability by introducing a novel architecture that decomposes the action-value into a global and a local term, estimated from a shared graph-convolution backbone. We evaluate our approach on a variety of graph classes, comparing to well-established baselines, demonstrating the performance and transferability of our generalist policies.