Abstract: Parameter sharing, where each agent independently learns a policy with fully shared parameters between all policies, is a popular baseline method for multi-agent deep reinforcement learning. Unfortunately, since all agents share the same policy network, they cannot learn different policies or tasks. This issue has been circumvented experimentally by adding an agent-specific indicator signal to observations, which we term ``agent indication.'' Agent indication is limited, however, in that without modification it does not allow parameter sharing to be applied to environments where the action spaces and/or observation spaces are heterogeneous. This work formalizes the notion of agent indication and proves that it enables convergence to optimal policies for the first time. Next, we formally introduce methods to extend parameter sharing to learning in heterogeneous observation and action spaces, and prove that these methods allow for convergence to optimal policies. Finally, we experimentally confirm that the methods we introduce function empirically, and conduct a wide array of experiments studying the empirical efficacy of many different agent indication schemes for image based observation spaces.
Submission Length: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=RpkIyRn3jW
Changes Since Last Submission: As suggested by the action editor, we have done more analysis in our experiment section 5. We have added error plots and heat-map in figure 2 and learning graphs in the appendix. We have correspondingly added more details in our conclusions and takeaways from our plots in section 5.
Along with experiments, we have highlighted our proposed agent indication methods for image based observation in section 4 and figure 1. We have added details on our method and cited any relevant papers of previous work.
We have also worked on cleaning the writing and emphasizing the main novelty of the manuscript.
Assigned Action Editor: ~Martha_White1
Submission Number: 1359
Loading