Keywords: bandits, knapsack, clustering
Abstract: In this work, we study the problem of clustered linear contextual bandits with knapsack constraints, a setting that closely models real-world recommender systems. In such systems, the overwhelmed number of items makes it impractical to explore all options, and overexposing certain items can harm content diversity and fairness. To address these challenges, our algorithm clusters actions to enable knowledge transfer across similar items and incorporates global resource constraints to limit over-consumption. We provide a formal analysis showing that the algorithm achieves sublinear regret in the number of time periods, even without access to the full action set. Notably, we prove that it is sufficient to perform clustering once on a randomly selected subset of actions.
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 14843
Loading