LOSI: Improving Multi-agent Reinforcement Learning via Latent Opponent Strategy Identification

ICLR 2026 Conference Submission19387 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multi-agent Reinforcement Learning, Opponent Strategy Identification, Contrastive Learning, SMAC-Hard
TL;DR: An unsupervised framework that identifies opponent strategies in real-time for robust multi-agent collaboration.
Abstract: In collaborative Multi-Agent Reinforcement Learning (MARL), agents must contend with non-stationarity introduced not only by teammates’ concurrent decisions but also by partially observable and diverse opponent strategies. Although recent MARL algorithms have achieved strong performance in complex decision-making tasks, they often overfit to specific opponent behaviors, resulting in sharp performance drops when encountering previously unseen strategies. To overcome this limitation, we propose Latent Opponent Strategy Identification (LOSI), an unsupervised framework that identifies and adapts to opponent strategies in real time without requiring explicit supervision. LOSI employs a trajectory encoder trained with a contrastive learning objective (InfoNCE) to map opponent behaviors into compact and discriminative embeddings. These embeddings are then used to condition both the MARL policy and the mixing network, thereby enabling adaptive and robust decision-making. Experimental results on challenging SMAC-Hard scenarios with mixed opponent strategies demonstrate that LOSI substantially improves generalization and achieves competitive or outperforming results compared to strong MARL baselines. Further analysis of the learned embedding space reveals meaningful clustering of trajectories by opponent strategy, even in the absence of ground-truth labels.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 19387
Loading