On Mitigating Affinity Bias through Bandits with Evolving Biased Feedback

Matthew Faw; Constantine Caramanis; Jessica Hoffmann

On Mitigating Affinity Bias through Bandits with Evolving Biased Feedback

Matthew Faw, Constantine Caramanis, Jessica Hoffmann

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Unconscious bias has been shown to influence how we assess our peers, with consequences for hiring, promotions and admissions. In this work, we focus on affinity bias, the component of unconscious bias which leads us to prefer people who are similar to us, despite no deliberate intention of favoritism. In a world where the people hired today become part of the hiring committee of tomorrow, we are particularly interested in understanding (and mitigating) how affinity bias affects this feedback loop. This problem has two distinctive features: 1) we only observe the _biased value_ of a candidate, but we want to optimize with respect to their _real value_ 2) the bias towards a candidate with a specific set of traits depends on the _fraction_ of people in the hiring committee with the same set of traits. We introduce a new bandits variant that exhibits those two features, which we call affinity bandits. Unsurprisingly, classical algorithms such as UCB often fail to identify the best arm in this setting. We prove a new instance-dependent regret lower bound, which is larger than that in the standard bandit setting by a multiplicative function of $K$. Since we treat rewards that are _time-varying_ and _dependent on the policy's past actions_, deriving this lower bound requires developing proof techniques beyond the standard bandit techniques. Finally, we design an elimination-style algorithm which nearly matches this regret, despite never observing the real rewards.

Lay Summary: Unconscious bias has been shown to influence how we assess our peers, with consequences for hiring, promotions and admissions. These assessments often have downstream consequences: for instance, the people we hire today may influence, indirectly or directly, the people we hire in the future. How do these unconscious biases affect this feedback loop? We focus on two essential features of unconscious bias: (1) we only observe the biased value of a candidate, but we want to optimize with respect to their real value 2) the bias towards a candidate with a specific set of traits depends on the fraction of people in the hiring committee with the same set of traits. We introduce a new bandits variant that exhibits those two features, which we call affinity bandits. We provide a near-tight characterization of this problem, deriving a new lower bound as well as an algorithm with performance nearly matching this bound. Our work highlights the many challenges that arise when making decisions in the presence of bias, even when the bias structure is known. We hope our techniques pave the way towards understanding when and how decision-makers can mitigate negative consequences of unconscious biases.

Primary Area: Theory->Online Learning and Bandits

Keywords: Fairness, Bandits, Lower Bounds, Affinity Bias, Biased Feedback

Submission Number: 12635

Loading