USE: Enhancing Mixed-Motive Cooperation via Unified Self and Collective Rewards

Qifan Liang; Zhengbang Zhu; Xihuai Wang; Feiyu Wang; Yixiang Shan; Lu Guo; Ting Long; Weinan Zhang; Yuan Tian

USE: Enhancing Mixed-Motive Cooperation via Unified Self and Collective Rewards

Qifan Liang, Zhengbang Zhu, Xihuai Wang, Feiyu Wang, Yixiang Shan, Lu Guo, Ting Long, Weinan Zhang, Yuan Tian

04 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Reinforcement Learning, Multi-Agent Systems, mixed-motive cooperation, Selfish and Cooperative

Abstract: Mixed-motive cooperation requires agents to balance individual and collective rewards, often leading to tension between self-interest and cooperation. Conventional methods typically treat individual and collective rewards as completely independent, solving mixed-motive cooperation by maximizing their weighted sum (sometimes with auxiliary constraints). Because maximizing either the individual or the collective reward alone is sufficient to increase the weighted sum, such a design might lead to incorrect credit assignment and converge to overly selfish or altruistic policies. To address this, we propose a novel method named Unifying Self and collEctive rewards (USE). USE decomposes the individual reward into an independent part, unaffected by others, and a dependent part, shaped by interactions with others, then correlates the individual reward with the collective reward via the dependent part, as both the dependent part and collective reward arise from the interaction (including cooperation or betrayal) of agents. This coupling transforms maximizing individual and collective rewards into intrinsically correlated objectives, so that optimizing one implicitly promotes the other, reducing the risk of overly selfish or altruistic convergence. We conduct extensive experiments in mixed-motive cooperation tasks, demonstrating the effectiveness of USE. Interestingly, we find that the correlation between individual and collective rewards, to a certain extent, reflects the cooperative tendency of the agents. Our code is available at https://anonymous.4open.science/r/QPC-B6FD.

Primary Area: reinforcement learning

Submission Number: 1864

Loading