Integrating Uncertainty Quantification into Robust Multi-Agent Equilibrium Learning

Integrating Uncertainty Quantification into Robust Multi-Agent Equilibrium Learning

03 Sept 2025 (modified: 08 Oct 2025)Submitted to Agents4ScienceEveryoneRevisionsBibTeXCC BY 4.0

Keywords: multi-agent reinforcement learning, uncertainty quantification, robust RL, CVaR, risk-aware equilibrium, Nash equilibrium, ensemble critics, trust region, exploitability, distribution shift

TL;DR: We fuse ensemble-based uncertainty with CVaR objectives and a risk-aware NashConv to reach ε-RNE; across diverse tasks it maintains ID returns while markedly reducing catastrophic tail failures.

Abstract: In multi-agent systems, uncertainty is not a side effect—it is the default state of the world. Random environment changes, limited observations, and shifting opponent strategies make it hard to know payoffs exactly. Classical Nash equilibrium assumes every agent knows the exact, noise-free payoffs of all possible strategies, which rarely holds in practice. To address this gap, we introduce the ε-Robust Nash Equilibrium (ε-RNE), a new solution concept that explicitly accounts for uncertainty using coherent risk measures, implemented here with Conditional Value-at-Risk (CVaR). An ε-RNE ensures that no single agent can improve its risk-adjusted outcome by more than ε through unilateral changes, making strategies more reliable in noisy settings. We design a decentralized learning algorithm that combines deep-ensemble uncertainty estimation, risk-sensitive value calculation, and targeted policy updates, with an approximate best-response check to track progress. We prove convergence to ε-RNE under standard smoothness and bounded-variance assumptions, and show it recovers the classical Nash equilibrium when uncertainty is small or ε → 0. Across cooperative, competitive, and mixed-motive tasks, our method consistently reduces exploitability, lowers performance swings, and better handles distribution shifts compared to risk-neutral and uncertainty-agnostic baselines. This demonstrates that directly modeling uncertainty leads to more stable and trustworthy multi-agent coordination.

Submission Number: 74

Loading