Fairness of Exposure in Stochastic Multiple-play Multi-armed Bandits

Youngmi Jin; Dongdeok Kim; Young-Joo Suh

Fairness of Exposure in Stochastic Multiple-play Multi-armed Bandits

Youngmi Jin, Dongdeok Kim, Young-Joo Suh

Published: 25 May 2026, Last Modified: 27 May 2026DEMO 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Fairness of exposure, multi-armed bandit, fairness, multiple-play multi-armed bandit

Abstract: We study a stochastic multiple-play multi-armed bandit (MAB) problem under semi-bandit feedback, where a decision maker selects $K$ arms from the set of $M$ arms under the fairness constraints requiring that each arm should be selected at least a predefined fraction of time. The objective is to maximize cumulative expected rewards while satisfying the fairness constraints. Under mild conditions, we characterize an optimal policy of the fair multiple-play MAB problem and propose a class of algorithms, called Fair-MMAB(K), based on this characterization. We show that Fair-MMAB(K) satisfies the fairness constraints at each time step, regardless of any choice of UCB index, and achieves an $O(1)$ fairness-aware regret when instantiated with UCB1 or KL-UCB. Numerical experiments validate our theoretical findings and demonstrate that Fair-MMAB(K) outperforms existing fair multiple-play MAB algorithms.

Submission Number: 85

Loading