Efficient Reinforcement Learning for Global Decision Making in the Presence of Local Agents at Scale

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Reinforcement Learning, Multi-agent Systems, Large-scale Systems, Mean-field Approximation
TL;DR: We develop a scalable algorithm for global decision-making in the presence of many local agents, where the global decision-maker samples local agents to make decisions to maximize the system rewards, therefore overcoming the curse of dimensionality.
Abstract:

We study reinforcement learning for global decision-making in the presence of local agents, where the global decision-maker makes decisions affecting all local agents, and the objective is to learn a policy that maximizes the joint rewards of all the agents. Such problems find many applications, e.g. demand response, EV charging, and queueing. In this setting, scalability has been a long-standing challenge due to the size of the joint state space which can be exponential in the number of agents. This work proposes the \texttt{SUBSAMPLE-Q} algorithm, where the global agent subsamples $k\leq n$ local agents to compute a policy in time that is polynomial in $k$. We show that this learned policy converges to the optimal policy on the order of $\tilde{O}(1/\sqrt{k}+\epsilon_{k,m})$ as the number of subsampled agents $k$ increases, where ${\epsilon}_{k,m}$ is the Bellman noise. Finally, we validate our theoretical results through numerical simulations in demand-response and queueing settings.

Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 11550
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview