Group Fairness in Reinforcement Learning

Published: 28 Apr 2023, Last Modified: 28 Apr 2023Accepted by TMLREveryoneRevisionsBibTeX
Abstract: We pose and study the problem of satisfying fairness in the online Reinforcement Learning (RL) setting. We focus on the group notions of fairness, according to which agents belonging to different groups should have similar performance based on some given measure. We consider the setting of maximizing return in an unknown environment (unknown transition and reward function) and show that it is possible to have RL algorithms that learn the best fair policies without violating the fairness requirements at any point in time during the learning process. In the tabular finite-horizon episodic setting, we provide an algorithm that combines the principle of optimism and pessimism under uncertainty to achieve zero fairness violation with arbitrarily high probability while also maintaining sub-linear regret guarantees. For the high-dimensional Deep-RL setting, we present algorithms based on the performance-difference style approximate policy improvement update step and we report encouraging empirical results on various traditional RL-inspired benchmarks showing that our algorithms display the desired behavior of learning the optimal policy while performing a fair learning process.
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: - We have provided more discussion about Assumption 2.3 and its necessity. We have also included a discussion in Section 5 where we suggest exploring the recent advances in the policy fine-tuning and hybrid-RL literature for a practical relaxation of this assumption. We believe relaxing this assumption would require studying the setting where we can distill an initial fair policy from a previously collected offline dataset, and hence we leave this exploration to future work. - We have included an additional example based on college admissions in Section 1 that demonstrates how the sequential effect of decisions motivates the fairness requirements.
Supplementary Material: zip
Assigned Action Editor: ~Michael_Bowling1
Submission Number: 700