Open Set Opponent Modeling

Yuheng Jing; Kai Li; Bingyun Liu; Ziwen Zhang; Zhe Wu; Jinmin He; Yifan Zhang; Junliang Xing; Jian Cheng

Open Set Opponent Modeling

Yuheng Jing, Kai Li, Bingyun Liu, Ziwen Zhang, Zhe Wu, Jinmin He, Yifan Zhang, Junliang Xing, Jian Cheng

17 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Opponent Modeling, Open Set, Reinforcement Learning

TL;DR: This paper presents a new training approach called Open Set Opponent Modeling (OSOM), which enables AI agents to effectively identify and respond to previously unseen opponents in multi-agent systems.

Abstract: In multi-agent systems, opponent modeling aims to reduce environmental uncertainty by modeling other agents. Existing research has utilized opponent information to enhance decision-making capabilities based on various methodologies. However, they generally lack good generalization when opponents adopt an open set of policies. In particular, no work has managed to effectively identify never-before-seen opponents. To address these issues, we propose an end-to-end Open Set Opponent Modeling (OSOM) training approach, which for the first time enables explicit identification and response to open set opponents. First, OSOM overcomes the challenges of partial observability by distilling opponent policies into information encodings of controlled agent through representation learning. Second, using randomly generated opponent type embeddings as prompts, OSOM achieves identification of opponent types with variable numbers and semantics by maximizing the probability of selecting the true opponent type embedding via contrastive learning. Finally, with the aggregated opponent type embeddings selected from recent history as context, OSOM learns to best respond to sampled opponents through online reinforcement learning. At test time, OSOM only needs to randomly generate opponent type embeddings as prompts again to achieve effective on-the-fly identification and response to non-stationary open set opponents. Extensive controlled experiments in competitive, cooperative, and mixed environments quantitatively validate the significant advantages of OSOM over existing approaches in terms of identification accuracy and response performance.

Supplementary Material: zip

Primary Area: reinforcement learning

Submission Number: 8700

Loading