Abstract: Efficient collaboration in the centralized training with decentralized execution (CTDE) paradigm remains a challenge in
cooperative multi-agent systems. We identify divergent action tendencies among agents as a significant obstacle to CTDE’s
training efficiency, requiring a large number of training samples to achieve a unified consensus on agents’ policies. This
divergence stems from the lack of adequate team consensus-related guidance signals during credit assignments in CTDE.
To address this, we propose Intrinsic Action Tendency Consistency, a novel approach for cooperative multi-agent reinforcement learning. It integrates intrinsic rewards, obtained through an action model, into a reward-additive CTDE (RA-CTDE) framework. We formulate an action model that enables surrounding agents to predict the central agent’s action tendency. Leveraging these predictions, we compute a
cooperative intrinsic reward that encourages agents to match their actions with their neighbors’ predictions. We establish
the equivalence between RA-CTDE and CTDE through theoretical analyses, demonstrating that CTDE’s training process
can be achieved using agents’ individual targets. Building on this insight, we introduce a novel method to combine intrin-
sic rewards and CTDE. Extensive experiments on challenging tasks in SMAC and GRF benchmarks showcase the improved
performance of our method.
Loading