Learning to Cooperate under Private Rewards

Learning to Cooperate under Private Rewards

TMLR Paper3024 Authors

19 Jul 2024 (modified: 27 Sept 2024)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: We address a critical challenge in multi-agent reinforcement learning (MARL): maximizing team rewards in scenarios where agents only have access to their individual, private rewards. This setting presents unique challenges, as agents must cooperate to optimize collective performance whilst having only local, potentially conflicting objectives. Existing MARL methods often tackle this by sharing rewards, values, or full policies, but these approaches raise concerns about privacy and computational overhead. We introduce Anticipation Sharing (AS), a novel MARL method that achieves team-level coordination through the exchange of anticipated peer action distributions. Our key theoretical contribution is a proof that the deviation between the collective return and individual objectives can be identified through these anticipations. This allows AS to align agent behaviours towards team objectives without compromising individual privacy or incurring the prohibitive costs of full policy sharing. Experimental results demonstrate that AS is competitive with baseline algorithms that share values or policy parameters, whilst offering significant advantages in privacy preservation and computational efficiency. Our work presents a promising direction for reward-private cooperative MARL in scenarios where agents must maximize team performance using only their private, individual rewards.

Submission Length: Regular submission (no more than 12 pages of main content)

Assigned Action Editor: ~Lihong_Li1

Submission Number: 3024

Loading