Abstract: Reinforcement learning is a commonly used technique for optimising objectives in decision support systems for complex problem solving. When these systems affect individuals or groups, it is essential to reflect on fairness. As absolute fairness is in practice not achievable, we propose a framework which allows to balance distinct fairness notions along with the primary objective. To this end, we formulate group and individual fairness in sequential fairness notions. First, we present an extended Markov decision process, ƒMDP, that is explicitly aware of individuals and groups. Next, we formalise fairness notions in terms of this ƒMDP which allows us to evaluate the primary objective along with the fairness notions that are important to the user, taking a multi-objective reinforcement learning approach. To evaluate our framework, we consider two scenarios that require distinct aspects of the performance-fairness trade-off: job hiring and fraud detection. The objectives in job hiring are to compose strong teams, while providing equal treatment to similar individual applicants and to groups in society. The trade-off in fraud detection is the necessity of detecting fraudulent transactions, while distributing the burden for customers of checking transactions fairly. In this framework, we further explore the influence of distance metrics on individual fairness and highlight the impact of the history size on the fairness calculations and the obtainable fairness through exploration.
Loading