One-Shot Clustering for Contextual Bandits with Knapsacks

One-Shot Clustering for Contextual Bandits with Knapsacks

ICLR 2026 Conference Submission14843 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: bandits, knapsack, clustering

Abstract: In this work, we study the problem of clustered linear contextual bandits with knapsack constraints, a setting that closely models real-world recommender systems. In such systems, the overwhelmed number of items makes it impractical to explore all options, and overexposing certain items can harm content diversity and fairness. To address these challenges, our algorithm clusters actions to enable knowledge transfer across similar items and incorporates global resource constraints to limit over-consumption. We provide a formal analysis showing that the algorithm achieves sublinear regret in the number of time periods, even without access to the full action set. Notably, we prove that it is sufficient to perform clustering once on a randomly selected subset of actions.

Primary Area: other topics in machine learning (i.e., none of the above)

Submission Number: 14843

Loading