Fairness in Cooperative Multi-objective Multi-agent Reinforcement Learning using Expected Utility
Keywords: Multi-Agent Reinforcement Learning, Multi-Objective Reinforcement Learning, Expected Scalarized Return, Fairness
Abstract: Fairness as equity and compromise across multiple viewpoints is a necessary consideration in any kind of decision that is evaluated from several possibly conflicting perspectives. It is also a property that artificial decision-making agents should uphold to be deployable to real-world problems. However, existing work in sequential decision-making ensures fairness among agents or objectives but struggles with real-world problems that are both multi-agent and multi-objective. Furthermore, research integrating fairness into Multi-Objective Reinforcement Learning (MORL) is focused on optimizing the Scalarized Expected Return (SER) criterion while mostly ignoring the Expected Scalarized Reward criterion (ESR). We argue that fairness in MORL should also be investigated under ESR since sometimes it is more suitable when solving problems where fairness matters. In this paper, we consider the problem of learning objective-wise fair policies in cooperative multi-agent multi-objective sequential decision-making problems. We propose a first single-policy algorithm able to learn efficient decentralized policies while ensuring fairness across objectives under ESR. In this context, we identify a fundamental challenge in multi-agent MORL under ESR: the conditioning of policies by the accumulated return of the agents, which complicates decentralized learning and hinders fairness. We provide a first way of addressing this issue while extending and adapting algorithms designed for single-agent MORL under ESR to the multi-agent case. Our algorithm is evaluated on discrete and continuous cooperative multi-objective multi-agent control tasks and achieves better performances than the considered baselines.
Area: Learning and Adaptation (LEARN)
Generative A I: I acknowledge that I have read and will follow this policy.
Submission Number: 1434
Loading