Keywords: Fairness; Long-term Fairness
Abstract: Decisions made by machine learning models may have lasting impacts over time, making long-term fairness a crucial consideration. It has been shown that when ignoring the long-term effect of decisions, naively imposing fairness criterion in static settings can actually exacerbate bias over time. To explicitly address biases in sequential decision-making, recent works formulate long-term fairness notions in Markov Decision Process (MDP) framework. They define the long-term bias to be the sum of static bias over each time step. However, we demonstrate that naively summing up the step-wise bias can cause a false sense of fairness since it fails to consider the importance difference of states during transition. In this work, we introduce a new long-term fairness notion called Equal Long-term Benefit Rate (ELBERT), which explicitly considers state importance and can preserve the semantics of static fairness principles in the sequential setting. Moreover, we show that the policy gradient of Long-term Benefit Rate can be analytically reduced to standard policy gradient. This makes standard policy optimization methods applicable for reducing the bias, leading to our proposed bias mitigation method ELBERT-PO. Experiments on three dynamical environments show that ELBERT-PO successfully reduces bias and maintains high utility.
Submission Number: 22
Loading