Diagnosing Dec-POMDP Requirements in Cooperative MARL

Published: 19 Dec 2025, Last Modified: 05 Jan 2026AAMAS 2026 FullEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multi-Agent Reinforcement Learning, Cooperative Multi-Agent Reinforcement Learning, Dec-POMDPs
TL;DR: We introduce and apply a diagnostic suite that audits whether cooperative MARL benchmarks genuinely require reasoning under partial observability and decentralisation.
Abstract: Cooperative multi-agent reinforcement learning (MARL) is typically framed as a decentralised partially observable Markov decision process (Dec-POMDP), a setting whose hardness stems from two key challenges: partial observability and decentralisation. Genuinely solving such tasks requires Dec-POMDP reasoning, where agents use history to infer hidden state and coordinate based on local information. Yet it remains unclear whether popular benchmarks actually demand this level of reasoning or permit success via shortcuts, such as learning policies that ignore observations entirely. We introduce a diagnostic suite for MARL that uses statistically grounded performance comparisons and information-theoretic probes to audit whether environments truly require such reasoning. We apply it to \emph{37} popular MARL scenarios across MPE, SMAX (v1 and v2), Overcooked (v1 and v2), Hanabi, and MaBrax, and the diagnostics reveal which tasks genuinely require Dec-POMDP reasoning and which admit shortcut solutions. We find that some widely used benchmarks may not adequately test core Dec-POMDP assumptions, potentially leading to over-optimistic assessments of progress. We therefore advocate careful environment design and release diagnostic tooling to ensure that success reflects genuine multi-agent coordination.
Area: Learning and Adaptation (LEARN)
Generative A I: I acknowledge that I have read and will follow this policy.
Submission Number: 1235
Loading