Decision-making with speculative opponent model-aided value function factorization

Jing Sun; Cong Zhang; Zhiguang Cao; Wen Song

Decision-making with speculative opponent model-aided value function factorization

Jing Sun, Cong Zhang, Zhiguang Cao, Wen Song

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Decision making, Cooperative multi-agent reinforcement learning;

TL;DR: This work proposes a novel value-based speculative opponent modeling algorithm that relies solely on local information.

Abstract: In many real-world scenarios, teams of agents must coordinate their actions while competing against opponents. Traditional multi-agent reinforcement learning (MARL) approaches often treat opponents as part of the environment, causing controlled agents to overlook the impact of their adversaries. Opponent modeling can enhance an agent’s decision-making by constructing predictive models of other agents. However, existing approaches typically rely on centralized learning with access to opponent data, and the process of extracting decentralized policies becomes impractical with larger teams. To address this issue, we propose the Distributional Speculative Opponent-aided mixing framework (DSOMIX), a novel value-based speculative opponent modeling algorithm that relies solely on local information—namely the agent's own observations, actions, and rewards. DSOMIX uses speculative beliefs to predict the behaviors of unseen opponents, enabling agents to make decisions based on local observations. Additionally, it incorporates distributional value decomposition models to capture a more granular representation of the agent's return distribution, improving the training process for the speculative opponent models. We formally derive a value-based theorem that underpins the training process. Extensive experiments across four challenging MARL benchmarks, including MPE and Pommerman, demonstrate that DSOMIX outperforms state-of-the-art methods, achieving superior performance and faster convergence.

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 10875

Loading