Interpretable Failure Analysis in Multi-Agent Reinforcement Learning Systems

Published: 19 Dec 2025, Last Modified: 05 Jan 2026AAMAS 2026 FullEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multi-Agent Reinforcement Learning (MARL), Explainability / XAI, Cascading Failures, Patient Zero Detection, Influence Attribution
TL;DR: A gradient-based explainer for MARL that detects Patient-Zero and reconstructs the failure cascade via critic-gradient influence, validated with high- vs low-influence interventions.
Abstract: Multi-Agent Reinforcement Learning (MARL) is entering safety-critical domains, yet post-hoc explanations of how a failure began and spread remain scarce. We present a two-stage, gradient-based forensic framework that answers: (1) who failed first (Patient-0)? (2) why might a non-attacked agent be flagged first (validation/correction of early flags)? and (3) how did failure propagate across agents? Stage1 performs per-agent onset detection via a Taylor-remainder probe of a policy-gradient cost, declaring Patient-0 at the first threshold crossing. Stage2 validates or corrects this candidate by tracing accelerating upstream influence with critic derivatives; first-order sensitivity and directional second-order curvature aggregated over a short causal window to yield a directed contagion graph. This directional analysis explains “downstream-first’’ anomalies by revealing critic-geometry pathways that amplify upstream deviations. Across 500 episodes in Simple Spread with 3 and 5 agents (cooperative navigation) and 100 episodes in StarCraft~II, using MADDPG and HATRPO, our method attains 88.2–99.4\% Patient-0 detection accuracy and provides plausible geometric explanations for anomalous flags. By moving beyond reward attribution to gradient-level forensics, the framework offers practical tools for diagnosing and mitigating cascading failures in MARL systems.
Area: Engineering and Analysis of Multiagent Systems (EMAS)
Generative A I: I acknowledge that I have read and will follow this policy.
Submission Number: 1291
Loading