Keywords: multi-objective, exploration-free, linear bandit
TL;DR: We study unexpected benefits that arise from multiple objectives.
Abstract: Multi-objective decision-making is often deemed overly complex in bandit settings, leading to algorithms that are both complicated and frequently impractical. In this paper, we challenge that notion by showing that, under a novel *goodness of arms* condition, multiple objectives can facilitate learning, enabling simple near-greedy methods to achieve sub-linear Pareto regret. To our knowledge, this is the first work to demonstrate the effectiveness of near-greedy algorithms for multi-objective bandits and also the first to study the regret of such algorithms for parametric bandits in the absence of context distributional assumptions. We further introduce a framework for *objective fairness*, supported by strong theoretical and empirical evidence, illustrating that multi-objective bandit problems can become both simpler and more efficient.
Supplementary Material: zip
Primary Area: learning theory
Submission Number: 22383
Loading