Abstract: Adequate strategizing of agents’ behaviors is es-
sential to solving cooperative MARL problems.
One intuitively beneficial yet uncommon method
in this domain is predicting agents’ future be-
haviors and planning accordingly. Leveraging
this point, we propose a two-level hierarchical
architecture that combines a novel information-
theoretic objective with a trajectory prediction
model to learn a “strategy”. To this end, we intro-
duce a latent policy that learns two types of latent
strategies: individual zA, and relational zR using
a modified Graph Attention Network module to
extract interaction features. We encourage each
agent to behave according to the strategy by condi-
tioning its local Q-functions on zA, and we further
equip agents with a shared Q-function that condi-
tions on zR . Additionally, we introduce two regu-
larizers to allow predicted trajectories to be accu-
rate and rewarding. Empirical results on Google
Research Football (GRF) and StarCraft (SC) II
micromanagement tasks show that our method
establishes a new state of the art being, to the best
of our knowledge, the first MARL algorithm to
solve all super hard SC II scenarios as well as the
GRF full game with a win rate higher than 95%,
thus outperforming all existing methods. Videos
and brief overview of the methods and results are
available at: https://sites.google.com/view/hier-
strats-marl
0 Replies
Loading