DSDF: Coordinated look-ahead strategy in stochastic multi-agent reinforcement learning

Satheesh K Perepu; Kaushik Dey

DSDF: Coordinated look-ahead strategy in stochastic multi-agent reinforcement learning

Satheesh K Perepu, Kaushik Dey

29 Sept 2021 (modified: 13 Feb 2023)ICLR 2022 Conference Withdrawn SubmissionReaders: Everyone

Keywords: reinforcement learning, multi-agent reinforcement learning, stochastic actions, poor co-ordination

Abstract: Multi-Agent reinforcement learning has received lot of attention in recent years and have applications in many different areas. Existing methods involving Centralized Training and Decentralized execution, attempts to train the agents towards learning a pattern of coordinated actions to arrive at optimal joint policy. However if some agents are stochastic in their action to varying degrees, the above methods provides poor coordination among agents. In this paper we show how the stochasticity of agents, which could be a result of malfunction or aging of robots, can add to the uncertainty in coordination and thereby contribute to unsatisfactory global rewards. In such a scenario, the deterministic agents have to understand the behavior and limitations of the stochastic agents while the stochastic agents have to plan taking in cognizance their own limitations. Our proposed method, Deep Stochastic Discounted Factor (DSDF), tunes the discounted factor for the agents by using a learning representation of uncertainty to update the utility networks of individual agents. DSDF also helps in imparting an extent of reliability in coordination thereby granting stochastic agents tasks which are immediate and of shorter trajectory with deterministic ones taking the tasks which involve longer planning. Results on benchmark environments shows the efficacy of the proposed approach when compared with existing deterministic approaches.

One-sentence Summary: In this work, we came with joint look-ahead strategy for collaborative multi-agent reinforcement learning where some of the agents are stochastic.

16 Replies

Loading