Clustering Items through Bandit Feedback: Finding the Right Feature out of Many

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We consider a problem of clustering in an adaptive and sequential setting, and provide an efficient and optimal algorithm.
Abstract: We study the problem of clustering a set of items based on bandit feedback. Each of the $n$ items is characterized by a feature vector, with a possibly large dimension $d$. The items are partitioned into two unknown groups, such that items within the same group share the same feature vector. We consider a sequential and adaptive setting in which, at each round, the learner selects one item and one feature, then observes a noisy evaluation of the item's feature. The learner's objective is to recover the correct partition of the items, while keeping the number of observations as small as possible. We provide an algorithm which relies on finding a relevant feature for the clustering task, leveraging the Sequential Halving algorithm. With probability at least $1-\delta$, we obtain an accurate recovery of the partition and derive an upper bound on the required budget . Furthermore, we derive an instance-dependent lower bound, which is tight in some relevant cases.
Lay Summary: Imagine you're trying to group a collection of objects—like images of vehicles—into two categories, but you don’t know the properties that define those categories. Each object has many characteristics (such as the number of visible wheels), but checking a characteristic takes time and resources. On top of that, each observation is noisy—it can contain errors. To solve this task, we proceed sequentially: one step at a time, we choose an object and which characteristic to observe on that object, aiming to gradually uncover both which characteristic matter and how the objects are grouped. We ask the following question: can we correctly sort all the objects while observing as few characteristics as possible? The key idea is that some features are especially informative for distinguishing between the two groups. Our method learns to identify and focus on these useful features throughout the observation process. We evaluate the performance of our method by measuring the number of observations it requires, and we show that no other method can achieve the same goal with significantly fewer observations.
Link To Code: https://github.com/grafmaxi/bandit_two_clusters
Primary Area: General Machine Learning->Online Learning, Active Learning and Bandits
Keywords: clustering, bandit theory, pure exploration, information-theoretic bounds, machine learning
Submission Number: 11008
Loading