Long-Term Fairness Without Utility Deterioration

TMLR Paper4887 Authors

19 May 2025 (modified: 26 May 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: In fair machine learning, the trade-off between fairness and utility has been predominantly studied in static classification settings, neglecting concerns for long-term learning environments where the population distribution may vary due to the deployment of model policies. This work investigates whether zero utility deterioration can be achieved in the long run. We introduce a Markov decision process (MDP) to formulate the interplay between model decisions and population distribution shifts. A key technical contribution is identifying a sufficient and necessary condition under which a model policy achieving long-term fairness does not compromise utility. Inspired by this condition, we propose effective reward functions that can be combined with online reinforcement learning algorithms, allowing the classifier to accommodate dynamic control objectives such as inducing population adaptations to maximize fairness without sacrificing model performance. Experiments on both synthetic and real-world datasets suggest the effectiveness of the proposed reinforcement learning framework in the long run and drive a classifier-population system toward a desirable equilibrium where the identified condition is met.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Meisam_Razaviyayn1
Submission Number: 4887
Loading