A2FC: A FEDERATED ADVANTAGE ACTOR-CRITIC LEARNING APPROACH FOR HETEROGENEOUS ACTION SPACES

Sheng Shen; Teng Joon Lim

A2FC: A FEDERATED ADVANTAGE ACTOR-CRITIC LEARNING APPROACH FOR HETEROGENEOUS ACTION SPACES

Sheng Shen, Teng Joon Lim

21 Sept 2023 (modified: 03 Jun 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: multi-agent reinforcement learning, federated learning

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: A novel federated A2C (reinforcement learning) approach for agents' heterogeneous actions spaces

Abstract: The growth of the Internet of Things (IoT) and the increasing demand for real-time networking have brought about a growing necessity for multiple reinforcement learning (RL) agents to collaboratively train within a shared environment, all working towards common objectives. The multi-agent Advantage Actor-Critic (A2C) algorithm is gaining popularity in Multi-Agent Reinforcement Learning (MARL) systems. However, this approach requires agents to share policy components among neighboring agents due to observations being only partially available to each agent. This practice increases communication overhead and raises privacy concerns. Federated learning (FL), recognized as a privacy-preserving machine learning method, can be applied in the MARL context with a central server aggregating the weights of the agents' actor and critic models. However, this technique assumes that all agents are capable of executing identical actions, which may be impractical. To overcome the aforementioned shortcomings, we introduce a novel FL A2C algorithm called "Advantage Actor Federated Critic (A2FC)". The proposed algorithm streamlines the aggregation of agents' critic models while offloading the training of actor models to the individual agents' local machines. An empirical experiment conducted in an adaptive traffic signal control (ATSC) system demonstrates the method's effectiveness in personalizing agents' actions, preserving agents' privacy during training, and mitigating communication overhead issues.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 3115

Loading