$K$-Level Policy Gradients for Multi-Agent Reinforcement Learning

Aryaman Reddi; Gabriele Tiboni; Jan Peters; Carlo D'Eramo

$K$-Level Policy Gradients for Multi-Agent Reinforcement Learning

Aryaman Reddi, Gabriele Tiboni, Jan Peters, Carlo D'Eramo

Published: 17 Jul 2025, Last Modified: 07 Oct 2025EWRL 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: multi-agent reinforcement learning, game theory, k-level reasoning, multi-agent systems, policy gradients

TL;DR: A recursive $K$-level reasoning method that improves the efficiency of centralized multi-agent policy gradient algorithms.

Abstract: Actor-critic algorithms for deep multi-agent reinforcement learning (MARL) typically employ a policy update that responds to the current strategies of other agents. While being straightforward, this approach does not account for the updates of other agents at the same update step, resulting in miscoordination. In this paper, we introduce the $K$-Level Policy Gradient (KPG), a method that recursively updates each agent against the updated policies of other agents, speeding up the discovery of effective coordinated policies. We theoretically prove that KPG with finite iterates achieves monotonic convergence to a local Nash equilibrium under certain conditions. We provide principled implementations of KPG by applying it to the deep MARL algorithms MAPPO, MADDPG, and FACMAC. Empirically, we demonstrate superior performance over existing deep MARL algorithms in StarCraft II and multi-agent MuJoCo.

Confirmation: I understand that authors of each paper submitted to EWRL may be asked to review 2-3 other submissions to EWRL.

Serve As Reviewer: ~Aryaman_Reddi1

Track: Regular Track: unpublished work

Submission Number: 61

Loading