Cluster Agnostic Network Lasso Bandits

Published: 17 Jul 2025, Last Modified: 06 Sept 2025EWRL 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Network Lasso, Online Learning, Multi-Armed Bandits
Abstract: We consider a multi-task contextual bandit setting, where the learner is given a graph encoding relations between the bandit tasks. The tasks' preference vectors are assumed to be piecewise constant over the graph, forming clusters. At every round, we estimate the preference vectors by solving an online network lasso problem with a suitably chosen, time-dependent regularization parameter. We establish a novel oracle inequality relying on a convenient restricted eigenvalue assumption. Our theoretical findings highlight the importance of dense intra-cluster connections and sparse inter-cluster ones. That results in a sublinear regret bound significantly lower than its counterpart in the independent task learning setting. Finally, we support our theoretical findings by experimental evaluation against graph bandit multi-task learning and online clustering of bandits algorithms.
Confirmation: I understand that authors of each paper submitted to EWRL may be asked to review 2-3 other submissions to EWRL.
Serve As Reviewer: ~Behzad_Nourani-Koliji1, ~Steven_Bilaj1
Track: Regular Track: unpublished work
Submission Number: 97
Loading