Communication in Multiagent Reinforcement Learning via Counterfactual Message Value

Zihong Gao

Published: 16 Oct 2025, Last Modified: 28 Jan 2026IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMSEveryoneCC BY 4.0

Abstract: Abstract— Effective communication is pivotal for successful team collaboration in cooperative multiagent tasks. However, mainly due to the partially observable nature of the environment, indiscriminate message requests among teammates may lead to confusion for individual agents, impeding effective collaboration and diminishing the overall efficiency of the system. Most previous work has employed gates or attention mechanisms to extract relatively important messages. However, these methods often fail to explicitly assess each message’s value, or the process of calculating is intricate and convoluted. This may result in ineffective communication and cause miscoordination in complex scenarios. To address these challenges, we introduce a novel metric named counterfactual message value (CMV), which quantifies each message’s contribution to an agent, enabling the effective elimination of redundant messages. In addition, we present a practical multiagent reinforcement learning (MARL) algorithm, termed CMV communication (CMVC), which could effectively facilitate the learning of agent–agent communication protocols. It predicts the CMVs of teammates for an agent solely based on its local observation, enabling the agent to initiate communication with those exhibiting positive CMVs. To differentiate message impact levels, we design a message aggregator in CMVC that aggregates messages based on their individual CMVs. We evaluate CMVC in various tasks, including cooperative navigation, predator–prey, and the StarCraft multiagent challenge (SMAC). The results indicate that CMVC is prominent in reducing redundant messages and has the ability to learn more advanced collaboration strategies, in contrast to several existing state-ofthe-art methods.