Cluster Agnostic Network Lasso Bandits

TMLR Paper5195 Authors

24 Jun 2025 (modified: 30 Jun 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: We consider a multi-task contextual bandit setting, where the learner is given a graph encoding relations between the bandit tasks. The tasks' preference vectors are assumed to be piecewise constant over the graph, forming clusters. At every round, we estimate the preference vectors by solving an online network lasso problem with a suitably chosen, time-dependent regularization parameter. We establish a novel oracle inequality relying on a convenient restricted eigenvalue assumption. Our theoretical findings highlight the importance of dense intra-cluster connections and sparse inter-cluster ones. That results in a sublinear regret bound significantly lower than its counterpart in the independent task learning setting. Finally, we support our theoretical findings by experimental evaluation against graph bandit multi-task learning and online clustering of bandits algorithms.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Huazheng_Wang1
Submission Number: 5195
Loading