Keywords: In-context learning, opponent modelling, mirror descent, information geometry, multi-agent reinforcement learning, Predictive Equilibrium
Abstract: Multi-agent learning of agentic models faces a fundamental tension: agents must learn to efficiently adapt to their opponents at test time. Recent work has shown that sequence models can learn to infer opponent strategies in-context---from the interaction history alone---however, mechanism behind this behaviour stays poorly understood despite its empirical evidence. In this position paper, we argue that the in-context learning of transformer-based multi-agent policies can be perceived as entropy-regularised mirror descent on the Fisher-Rao manifold of opponent strategies. Building on findings of \citet{d2026transformers} providing a constructive proof that transformers implement mirror descent for the latent mixture models, we identify opponent types as the latent variables and interaction histories as the observed sequences where each attention layer can be interpreted as performing an implicit step of belief updating over opponent prototypes, with the softmax attention weights serving as the updated mixture weights. The fixed points of this dynamics correspond to self-consistent embedded Predictive Equilibria \citep{meulemans2025embedded,weis2026multi}. We hope that the position can suggest that standard self-supervised interaction sequence prediction on diverse opponent pools suffices for the induction of a theory-of-mind-like opponent reasoning, bridging the gap between agent modelling and acting.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Paper Type: Standard paper
Submission Number: 58
Loading