Track: Type A (Regular Papers)
Keywords: Reinforcement Learning, Multi-Agent, Explainability
Abstract: Deep Reinforcement Learning has achieved important progress in complex environments but remain difficult to interpret due to the opacity of deep neural policies. This challenge becomes even more pronounced in cooperative Multi-Agent Reinforcement Learning where coordination between agents must be understood alongside individual decision-making. In this work, we investigate whether cooperation is explicitly encoded in agent policies or if it emerges as the by-product of selfish incentives. We propose a reward decomposition framework that categorizes reward components in cooperative or selfish categories to explain cooperative behaviours and apply this method in the Laser Learning Environment where agents heavily rely on each other to succeed. Our approach enables an analysis at two levels: locally, by identifying which incentives dominate specific actions, and globally, by tracking how priorities evolve during training. Overall, our experiments uncover explicit cooperation in key transitions while also exposing the persistence of selfish incentives.
Serve As Reviewer: ~Tom_Lenaerts2
Submission Number: 26
Loading