Causal Logistic Bandits with Counterfactual Fairness Constraints

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY-NC-ND 4.0
TL;DR: We analyze logistic bandits regret with a long-term constraint for counterfactual fairness.
Abstract: Artificial intelligence will play a significant role in decision making in numerous aspects of society. Numerous fairness criteria have been proposed in the machine learning community, but there remains limited investigation into fairness as defined through specified attributes in a sequential decision-making framework. In this paper, we focus on causal logistic bandit problems where the learner seeks to make fair decisions, under a notion of fairness that accounts for counterfactual reasoning. We propose and analyze an algorithm by leveraging primal-dual optimization for constrained causal logistic bandits where the non-linear constraints are a priori unknown and must be learned in time. We obtain sub-linear regret guarantees with leading term similar to that for unconstrained logistic bandits (Lee et al., 2024) while guaranteeing sub-linear constraint violations. We show how to achieve zero cumulative constraint violations with a small increase in the regret bound.
Lay Summary: - We study fairness in an online learning framework, where fairness is defined using counterfactual reasoning — that is, reasoning about what would have happened if an individual had belonged to a different demographic group. - Our framework allows the learning algorithm to make potentially unfair decisions, but penalizes the expected reward for actions that exhibit large counterfactual fairness violations. - These results contribute to the broader research on causal decision-making systems,
Primary Area: General Machine Learning->Online Learning, Active Learning and Bandits
Keywords: bandits, causality, fairness
Submission Number: 11680
Loading