Opponent Transformer: Modeling Opponent Policies as a Sequence Problem

Published: 01 Jun 2024, Last Modified: 17 Jun 2024CoCoMARL 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multi-agent reinforcement learning, opponent modeling, partial observability, task adaptation
TL;DR: We learn an explicit opponent model to reconstruct opponent trajectories to train a policy to generalize to multiple sets of opponent policies.
Abstract: The ability of an agent to understand the intentions of others in a multi-agent system, also called opponent modeling, is critical for the design of effective local control policies. One main challenge is the unavailability of other agents' episodic trajectories at execution. To address the challenge, we propose a new approach that explicitly models the episodic trajectories of others. In particular, the proposed approach is to cast the opponent modeling problem as a sequence modeling problem via conditioning a transformer model on the sequence of the agent's local trajectory and predicting each opponent agent's trajectory. To evaluate the effectiveness of the proposed approach, we conduct experiments using a set of multi-agent environments that capture both cooperative and competitive payoff structures. The results show that the proposed method can provide better opponent modeling capabilities while achieving competitive or superior episodic returns.
Submission Number: 18
Loading