Equal Long-term Benefit Rate: Adapting Static Fairness Notions to Sequential Decision Making

Yuancheng Xu; Chenghao Deng; Yanchao Sun; Ruijie Zheng; Xiyao Wang; Jieyu Zhao; Furong Huang

Equal Long-term Benefit Rate: Adapting Static Fairness Notions to Sequential Decision Making

Yuancheng Xu, Chenghao Deng, Yanchao Sun, Ruijie Zheng, Xiyao Wang, Jieyu Zhao, Furong Huang

19 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: societal considerations including fairness, safety, privacy

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: Long-term Fairness; Fairness; Sequential decision making; Reinforcement Learning

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: We provide a principled way to adapt static fairness notions to sequential decision-making and show how to adapt Reinforcement Learning methods for bias mitigation.

Abstract: Decisions made by machine learning models may have lasting impacts over time, making long-term fairness a crucial consideration. It has been shown that when ignoring the long-term effect, naively imposing fairness criterion in static settings can actually exacerbate bias over time. To explicitly address biases in sequential decision-making, recent works formulate long-term fairness notions in Markov Decision Process (MDP) framework. They define the long-term bias to be the sum of static bias over each time step. However, we demonstrate that naively summing up the step-wise bias can cause a false sense of fairness since it fails to consider the importance difference of different time steps during transition. In this work, we introduce a long-term fairness notion called Equal Long-term Benefit Rate (ELBERT), which explicitly considers varying temporal importance and adapts static fairness principles to the sequential setting. Moreover, we show that the policy gradient of Long-term Benefit Rate can be analytically reduced to standard policy gradient. This makes standard policy optimization methods applicable for reducing bias, leading to our bias mitigation method ELBERT-PO. Extensive experiments on diverse sequential decision making environments consistently show that ELBERT-PO significantly reduces bias and maintains high utility.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 1977

Loading