Learning Fair Pareto-Optimal Policies in Multi-Objective Reinforcement Learning

Umer Siddique; Peilang Li; Yongcan Cao

Learning Fair Pareto-Optimal Policies in Multi-Objective Reinforcement Learning

Umer Siddique, Peilang Li, Yongcan Cao

Published: 01 Apr 2025, Last Modified: 02 May 2025ALAEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multi-objective reinforcement learning, Deep reinforcement learning, Fair optimization, Welfare functions

Abstract: Fairness is an important aspect of decision-making in multi-objective reinforcement learning (MORL), where policies must ensure both optimality and equity across multiple, potentially conflicting objectives. While $single-policy$ MORL methods can learn fair policies for fixed user preferences using welfare functions such as the generalized Gini welfare function (GGF), they fail to provide the diverse set of policies necessary for dynamic or unknown user preferences. To address this limitation, we formalize the fair optimization problem in $multi-policy$ MORL, where the goal is to learn a set of Pareto-optimal policies that ensure fairness across all possible user preferences. Our key technical contributions are threefold: (1) We show that for concave, piecewise-linear welfare functions (e.g., GGF), fair policies remain in the convex coverage set (CCS), which is an approximated Pareto front for linear scalarization. (2) We demonstrate that non-stationary policies, augmented with accrued reward histories, and stochastic policies improve fairness by dynamically adapting to historical inequities. (3) We propose three novel algorithms, which include integrating GGF with multi-policy multi-objective Q-Learning (MOQL), state-augmented multi-policy MOQL for learning non-statoinary policies, and its novel extension for learning stochastic policies. To validate the performance of the proposed algorithms, we perform experiments in various domains and compare our methods against the state-of-the-art MORL baselines. The empirical results show that our methods learn a set of fair policies that accommodate different user preferences.

Supplementary Material: pdf

Type Of Paper: Full paper (max page 8)

Anonymous Submission: Anonymized submission.

Submission Number: 43

Loading