Keywords: Self-Play, Diffusion model, zero-sum games
TL;DR: We propose DiffFP, a diffusion-based fictitious self-play framework that learns robust, multimodal policies in continuous-action multi-agent settings.
Abstract: Reinforcement learning has demonstrated success in learning strategic behaviors in multi-agent settings. However, achieving robust performance in dynamic, continuous state-action games remains a significant challenge. Agents must anticipate diverse and potentially unseen opponent strategies to ensure adaptability in multi-agent environments. These challenges often lead to slow convergence or even failure to converge to a Nash equilibrium. To address these challenges, we propose $\textit{DiffFP}$, a fictitious-play (FP) framework that estimates the best response to unseen opponents while learning a robust and multimodal behavioral policy. Specifically, we approximate the best response using a diffusion-policy that leverages generative modeling to learn adaptive and multimodal strategies. Through extensive empirical evaluation, we demonstrate that the proposed FP framework converges towards an approximate Nash equilibrium in continuous state-action zero-sum games. We validate our method on complex multi-agent environments, including racing and multi-particle dynamic games. Our results show that the learned policies are robust against diverse opponents and outperform baseline reinforcement learning policies. Our approach achieves upto 3$\times$ faster convergence and 30$\times$ higher success rates on average against RL baselines, demonstrating its robustness to opponent strategies and stability across training iteration
Paper Type: New Short Paper
Submission Number: 5
Loading