Keywords: Multi-objective reinforcement learning, Deep reinforcement learning, Fair optimization, Welfare functions
Abstract: Fairness is an important aspect of decision-making in multi-objective reinforcement learning (MORL), where policies must ensure both optimality and equity across multiple, potentially conflicting objectives.
While $single-policy$ MORL methods can learn fair policies for fixed user preferences using welfare functions such as the generalized Gini welfare function (GGF), they fail to provide the diverse set of policies necessary for dynamic or unknown user preferences.
To address this limitation, we formalize the fair optimization problem in $multi-policy$ MORL, where the goal is to learn a set of Pareto-optimal policies that ensure fairness across all possible user preferences.
Our key technical contributions are threefold: (1) We show that for concave, piecewise-linear welfare functions (e.g., GGF), fair policies remain in the convex coverage set (CCS), which is an approximated Pareto front for linear scalarization. (2) We demonstrate that non-stationary policies, augmented with accrued reward histories, and stochastic policies improve fairness by dynamically adapting to historical inequities. (3) We propose three novel algorithms, which include integrating GGF with multi-policy multi-objective Q-Learning (MOQL), state-augmented multi-policy MOQL for learning non-statoinary policies, and its novel extension for learning stochastic policies.
To validate the performance of the proposed algorithms, we perform experiments in various domains and compare our methods against the state-of-the-art MORL baselines.
The empirical results show that our methods learn a set of fair policies that accommodate different user preferences.
Supplementary Material: pdf
Type Of Paper: Full paper (max page 8)
Anonymous Submission: Anonymized submission.
Submission Number: 43
Loading