On the Near-Optimality of Local Policies in Large Cooperative Multi-Agent Reinforcement LearningDownload PDF

06 Jul 2022, 05:42 (modified: 13 Sept 2022, 11:29)Accepted by TMLRReaders: Everyone
Abstract: We show that in a cooperative $N$-agent network, one can design locally executable policies for the agents such that the resulting discounted sum of average rewards (value) well approximates the optimal value computed over all (including non-local) policies. Specifically, we prove that, if $|\mathcal{X}|, |\mathcal{U}|$ denote the size of state, and action spaces of individual agents, then for sufficiently small discount factor, the approximation error is given by $\mathcal{O}(e)$ where $e\triangleq \frac{1}{\sqrt{N}}\left[\sqrt{|\mathcal{X}|}+\sqrt{|\mathcal{U}|}\right]$. Moreover, in a special case where the reward and state transition functions are independent of the action distribution of the population, the error improves to $\mathcal{O}(e)$ where $e\triangleq \frac{1}{\sqrt{N}}\sqrt{|\mathcal{X}|}$. Finally, we also devise an algorithm to explicitly construct a local policy. With the help of our approximation results, we further establish that the constructed local policy is within $\mathcal{O}(\max\{e,\epsilon\})$ distance of the optimal policy, and the sample complexity to achieve such a local policy is $\mathcal{O}(\epsilon^{-3})$, for any $\epsilon>0$.
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Length: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=WmWSlG93iR&referrer=%5BAuthor%20Console%5D(%2Fgroup%3Fid%3DTMLR%2FAuthors%23your-submissions)
Changes Since Last Submission: We have incorporated the changes suggested by the reviewers in the camera-ready version.
Code: https://github.itap.purdue.edu/Clan-labs/NearOptimalLocalPolicy
Assigned Action Editor: ~Marc_Lanctot1
10 Replies

Loading