Thompson Sampling for Multi-Objective Linear Contextual Bandit

Somangchan Park; Heesang Ann; Min-hwan Oh

Thompson Sampling for Multi-Objective Linear Contextual Bandit

Somangchan Park, Heesang Ann, Min-hwan Oh

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY-NC-ND 4.0

Keywords: Multi-Objective Multi-armed Bandit, Linear Contextual Bandit, Pareto Optimality, Thompson Sampling

TL;DR: We propose the first Thompson Sampling algorithm with Pareto regret guarantees in multi-objective linear contextual bandit.

Abstract: We study the multi-objective linear contextual bandit problem, where multiple possible conflicting objectives must be optimized simultaneously. We propose $\texttt{MOL-TS}$, the first Thompson Sampling algorithm with Pareto regret guarantees for this problem. Unlike standard approaches that compute an empirical Pareto front each round, $\texttt{MOL-TS}$ samples parameters across objectives and efficiently selects an arm from a novel effective Pareto front, which accounts for repeated selections over time. Our analysis shows that $\texttt{MOL-TS}$ achieves a worst-case Pareto regret bound of $\widetilde{O}(d^{3/2}\sqrt{T})$, where $d$ is the dimension of the feature vectors, $T$ is the total number of rounds, matching the best known order for randomized linear bandit algorithms for single objective. Empirical results confirm the benefits of our proposed approach, demonstrating improved regret minimization and strong multi-objective performance.

Supplementary Material: zip

Primary Area: Theory (e.g., control theory, learning theory, algorithmic game theory)

Submission Number: 9988

Loading