Pareto Optimal Risk-Agnostic Distributional Bandits with Heavy-Tail Rewards

Kyungjae Lee; Dohyeong Kim; Taehyun Cho; Chaeyeon Kim; Yunkyung Ko; Seungyub Han; Seokhun Ju; Dohyeok Lee; Sungbin Lim

Pareto Optimal Risk-Agnostic Distributional Bandits with Heavy-Tail Rewards

Kyungjae Lee, Dohyeong Kim, Taehyun Cho, Chaeyeon Kim, Yunkyung Ko, Seungyub Han, Seokhun Ju, Dohyeok Lee, Sungbin Lim

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multi-Objective Bandits, Bandits with Heavy-Tail Rewards, Robust Distribution Estimation

TL;DR: We propose DistLCB, a multi-risk bandit algorithm for heavy-tailed rewards that leverages Wasserstein-based confidence bounds to achieve Pareto-optimality and provable regret guarantees.

Abstract: This paper addresses the problem of multi-risk measure agnostic multi-armed bandits in heavy-tailed reward settings. We propose a framework that leverages novel deviation inequalities for the $1$-Wasserstein distance to construct confidence intervals for Lipschitz risk measures. The distributional LCB (DistLCB) algorithm is introduced, which achieves asymptotic optimality by deriving the first lower bounds for risk measure aware bandits with explicit sub-optimality gap dependencies. The DistLCB is further extended to multi-risk objectives, which enables Pareto-optimal solutions that consider multiple aspects of reward distributions. Additionally, we provide a regret analysis that includes both gap-dependent and gap-independent bounds for multi-risk settings. Experiments validate the effectiveness of the proposed methods in synthetic and real-world applications.

Primary Area: Theory (e.g., control theory, learning theory, algorithmic game theory)

Submission Number: 6460

Loading