Keywords: Multi-Objective Bandits, Bandits with Heavy-Tail Rewards, Robust Distribution Estimation
TL;DR: We propose DistLCB, a multi-risk bandit algorithm for heavy-tailed rewards that leverages Wasserstein-based confidence bounds to achieve Pareto-optimality and provable regret guarantees.
Abstract: This paper addresses the problem of multi-risk measure agnostic multi-armed bandits in heavy-tailed reward settings. 
We propose a framework that leverages novel deviation inequalities for the $1$-Wasserstein distance to construct confidence intervals for Lipschitz risk measures. 
The distributional LCB (DistLCB) algorithm is introduced, which achieves asymptotic optimality by deriving the first lower bounds for risk measure aware bandits with explicit sub-optimality gap dependencies.
The DistLCB is further extended to multi-risk objectives, which enables Pareto-optimal solutions that consider multiple aspects of reward distributions.
Additionally, we provide a regret analysis that includes both gap-dependent and gap-independent bounds for multi-risk settings. 
Experiments validate the effectiveness of the proposed methods in synthetic and real-world applications.
Primary Area: Theory (e.g., control theory, learning theory, algorithmic game theory)
Submission Number: 6460
Loading