Keywords: Energy-based Models, Multi-Agent Learning, Surprise Minimization
TL;DR: Surprise minimization in multi-agent learning can be achieved with a temporal EBM estimating teh change in fast-paced dynamics.
Abstract: Multi-Agent Reinforcement Learning (MARL) has demonstrated significant suc2 cess by virtue of collaboration across agents. Recent work, on the other hand, introduces surprise which quantifies the degree of change in an agent’s environ4 ment. Surprise-based learning has received significant attention in the case of single-agent entropic settings but remains an open problem for fast-paced dynamics in multi-agent scenarios. A potential alternative to address surprise may be realized through the lens of free-energy minimization. We explore surprise minimization in multi-agent learning by utilizing the free energy across all agents in a multi-agent system. A temporal Energy-Based Model (EBM) represents an estimate of surprise which is minimized over the joint agent distribution. Our formulation of the EBM is theoretically akin to the minimum conjugate entropy objective and highlights suitable convergence towards minimum surprising states. We further validate our theoretical claims in an empirical study of multi-agent tasks demanding collabora14 tion in the presence of fast-paced dynamics. Our implementation and agent videos are available at the Project Webpage.
Supplementary Material: pdf