Abstract: Existing methods of Multi-Agent Reinforcement learning, involving Centralized Training and Decentralized execution, attempt to train the agents towards learning a pattern of coordinated actions to arrive at optimal joint policy. However, during the execution phase, if some of the agents degrade and develop noisy actions to varying degrees, the above methods provide poor coordination. In this paper, we show how such random noise in agents, which could be a result of the degradation or aging of robots, can add to the uncertainty in coordination and thereby contribute to unsatisfactory global rewards. In such a scenario, the agents which are in accordance with the policy have to understand the behavior and limitations of the noisy agents while the noisy agents have to plan in cognizance of their limitations. In our proposed method, Deep Stochastic Discount Factor (DSDF), based on the degree of degradation the algorithm tunes the discount factor for each agent uniquely, thereby altering the global planning of the agents. Moreover, given the degree of degradation in some agents is expected to change over time, our method provides a framework under which such changes can be incrementally addressed without extensive retraining. Results on benchmark environments show the efficacy of the DSDF approach when compared with existing approaches.
Submission Length: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Matthieu_Geist1
Submission Number: 784
Loading