Keywords: Dynamical Systems, Multi-Agent Reinforcement Learning, Temporal-Difference Learning, Partial Observability
TL;DR: To bridge complex systems science and multi-agent machine learning we present a mathematically efficient description of multi-agent reinforcement learning under partial observability as a deterministic dynamical system.
Abstract: Complex adaptive systems occur in all domains across all scales, from cells to societies. The question, however, of how the various forms of collective behavior can emerge from individual behavior and feedback to influence those individuals remains open. Complex systems theory focuses on emerging patterns from deliberately simple individuals. Fields such as machine learning and cognitive science emphasize individual capabilities without considering the collective level much. To date, however, little work went into modeling the effects of changing and uncertain environments on emergent collective behavior from individually self-learning agents. To this end, we derive and present deterministic memory mean-field temporal-difference reinforcement learning dynamics where the agents only partially observe the actual state of the environment. This paper aims to obtain an efficient mathematical description of the emergent behavior of biologically plausible and parsimonious learning agents for the typical case of environmental and perceptual uncertainty. We showcase the broad applicability of our dynamics across different classes of agent-environment systems, highlight emergent effects caused by partial observability and show how our method enables the application of dynamical systems theory to partially observable multi-agent learning. The presented dynamics have the potential to become a formal yet practical, lightweight, and robust tool for researchers in biology, social science, and machine learning to systematically investigate the effects of interacting partially observant agents.