Abstract: In a multi-agent system, an agent normally can only access parts of the information of the state (partial observability) and the behaviors of others may keep changing (stochasticity) during the training procedure. Agents can obtain more information via communication to better understand the state and the behavior of others. However, the coordination problem still exists since agents sometimes infer incorrect others' actions based on observations. It is also not possible to communicate actions directly at the same time. Otherwise, all agents need to make decisions based on others' actions, leading to circular dependencies. In this paper, we propose a novel multi-level communication scheme, \textit{Sequential Communication} (SeqComm). SeqComm treats agents asynchronously (each agent is assigned a different priority of decision-making, and the higher the priority of decision-making, the higher level the agent is). In addition, we have two communication phases. The negotiation phase is used to determine the priority of decision-making for agents. Agents first communicate hidden states of observations with others. Then, agents communicate and compare the corresponding values of agents' intentions to determine the priority of decision-making. The value of each intention represents the predicted rewards of future behavior without considering others by a learned world model (modeling the environmental dynamics). In the launching phase, the upper-level agents take the lead in making decisions and then communicate their actions with the lower-level agents. Theoretically, we prove the policies learned by SeqComm are guaranteed to improve monotonically and converge. Empirically, we show that SeqComm outperforms existing methods in a variety of cooperative multi-agent tasks.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~DJ_Strouse1
Submission Number: 1792
Loading