Keywords: Reinforcement Learning, Ad Hoc Teamwork, Multi-Agent Learning, Shapley Value
TL;DR: We propose an axiomatic framework for NAHT, showing how Shapley’s axioms guide reinforcement learning algorithms that improve performance.
Abstract: Open multi-agent systems are increasingly relevant for modelling the emerging real-world domains such as smart grids and swarm robotics. This paper addresses the recently posed problem of n-agent ad hoc teamwork (NAHT), where only a subset of agents is controllable. We propose an axiomatic game-theoretic framework for the NAHT, formulated via a cooperative game model which differentiates between the learning objectives of NAHT and MARL. Within this framework, the axiomatic characterization of the Shapley value—Efficiency, Symmetry, and Linearity—is reinterpreted as structural constraints on individual value functions. This yields a principled design space: enforcing all axioms recovers the Shapley value, while dropping Efficiency yields the Banzhaf index, leading to our Banzhaf Machine variant. As concrete instantiations, we develop Shapley Machine and Banzhaf Machine, which enforce different subsets of axioms during learning. Implemented on IPPO and POAM, these algorithms provide stronger performance.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 12937
Loading