KPF: DOMINATING MULTI-AGENT ADVERSARIAL COMPETITION VIA KALMAN-INSPIRED POLICY FU- SION MECHANISM

Wucheng Wang; Bocheng Zhao; Canyu Liu; Zhenyu Kong; Lei Bao; Qiguang Miao; Boda Ye

KPF: DOMINATING MULTI-AGENT ADVERSARIAL COMPETITION VIA KALMAN-INSPIRED POLICY FU- SION MECHANISM

Wucheng Wang, Bocheng Zhao, Canyu Liu, Zhenyu Kong, Lei Bao, Qiguang Miao, Boda Ye

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multi-Agent Reinforcement Learning, Policy Fusion, Kalman Filter, Online Adaptation, Uncertainty Estimation, Ensemble Methods

TL;DR: We propose a Kalman-inspired fusion framework that dynamically integrates multiple MARL policies to minimize decision uncertainty, achieving SOTA performance that includes a 100% win rate on many of SMAC's most challenging scenarios.

Abstract: Despite rapid advancements in Multi-Agent Reinforcement Learning (MARL), its application to complex, highly stochastic, and dynamic environments has been hindered by limitations in generalization capabilities and robustness, often resulting in degraded performance. To address this challenge, this paper proposes Kalman Policy Fusion (KPF), a novel decision fusion mechanism inspired by the Kalman filter. The core of the KPF mechanism lies in the learning-oriented adaptive weighting and iterative optimization of policy distributions during testing. It dynamically fuses multi-agent policies while minimizing their differences, thus effectively improving generalization and robustness in highly stochastic environments and achieving exceptional performance in dynamic adversarial tasks. Furthermore, this work is the first to empirically and systematically demonstrate that the efficacy of the base models, their distributional characteristics, and their mutual complementarity are the key prerequisites that determine the upper bound of fusion performance. Comprehensive evaluations demonstrate that our mechanism establishes new SOTA benchmarks across four diverse environments, including the StarCraft Multi-Agent Challenge (SMAC) and its successor SMACv2, Google Research Football (GRF), and the Multi-Agent Particle Environment (MPE). Notably, within the complex domain of StarCraft II, KPF achieves a perfect 100% win rate on numerous challenging maps. By optimizing policy weights to approximate an unknown optimal policy, our results validate the efficacy of the Kalman-based approach in MARL decision optimization, offering valuable insights for building more robust and efficient multi-agent systems.Our code will be released on GitHub upon acceptance.

Primary Area: reinforcement learning

Submission Number: 11035

Loading