Group Fairness in Predict-Then-Optimize Settings for Restless Bandits

Shresth Verma; YUNFAN ZHAO; Sanket Shah; Niclas Boehmer; Aparna Taneja; Milind Tambe

Group Fairness in Predict-Then-Optimize Settings for Restless Bandits

Shresth Verma, YUNFAN ZHAO, Sanket Shah, Niclas Boehmer, Aparna Taneja, Milind Tambe

Published: 26 Apr 2024, Last Modified: 15 Jul 2024UAI 2024 oralEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Fairness, predict-then-optimize, decision-focused-learning, restless bandits

TL;DR: We develop a decision-focused-learning pipeline to solve equitable RMABs, using a novel budget allocation algorithm to prevent disparity between groups.

Abstract: Restless multi-arm bandits (RMABs) are a model for sequentially allocating a limited number of resources to agents modeled as Markov Decision Processes. RMABs have applications in cellular networks, anti-poaching, and in particular, healthcare. For such high-stakes use cases, allocations are often required to treat different groups of agents (e.g., defined by sensitive attributes) fairly. In addition to the fairness challenge, agents' transition probabilities are often unknown and need to be learned in real-world problems. Thus, group fairness in RMABs requires us to simultaneously learn transition probabilities and how much budget we allocate to each group. Overcoming this key challenge ignored by previous work, we develop a decision-focused-learning pipeline to solve equitable RMABs, using a novel budget allocation algorithm to prevent disparity between groups. Our results on both synthetic and real-world large-scale datasets demonstrate that incorporating fair planning into the learning step greatly improves equity with little sacrifice in utility.

List Of Authors: Verma, Shresth and Zhao, Yunfan and Shah, Sanket and Boehmer, Niclas and Taneja, Aparna and Tambe, Milind

Latex Source Code: zip

Signed License Agreement: pdf

Submission Number: 499

Loading