Keywords: Reinforcement Learning, Ad Hoc Teamwork, Multi-Agent Learning, Shapley Value
TL;DR: We propose an axiomatic framework for NAHT, showing how Shapley’s axioms guide reinforcement learning algorithms that improve performance.
Abstract: Open multi-agent systems are increasingly relevant for modelling the emerging real-world domains such as smart grids and swarm robotics. This paper addresses the recently posed problem of n-agent ad hoc teamwork (NAHT), where only a subset of agents is controllable. Existing approaches rely on heuristic designs that lack theoretical grounding. We propose an axiomatic game-theoretic framework for NAHT, formulated via the state-specific cooperative game space. Within this framework, the axiomatic characterization of the Shapley value—Efficiency, Symmetry, and Linearity—is reinterpreted as structural constraints on individual value functions. This yields a principled design space: enforcing all axioms recovers the Shapley value, while dropping Efficiency yields the Banzhaf index, leading to our Banzhaf Machine variant. As concrete instantiations, we develop Shapley Machine and Banzhaf Machine, which enforce different subsets of axioms during learning. Implemented on IPPO and POAM, these algorithms provide stronger performance; notably, relaxing the Efficiency axiom may even outperform enforcing the full set in terms of agent type generalization.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 12937
Loading