Attention-based clustering

Rodrigo Maulen-Soto; Pierre Marion; Claire Boyer

Attention-based clustering

Rodrigo Maulen-Soto, Pierre Marion, Claire Boyer

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY-NC-ND 4.0

Keywords: Theory of neural networks, Clustering, Transformers, Attention-based models, Mixture models, Optimization, Unsupervised learning

Abstract: Transformers have emerged as a powerful neural network architecture capable of tackling a wide range of learning tasks. In this work, we provide a theoretical analysis of their ability to automatically extract structure from data in an unsupervised setting. In particular, we demonstrate their suitability for clustering when the input data is generated from a Gaussian mixture model. To this end, we study a simplified two-head attention layer and define a population risk whose minimization with unlabeled data drives the head parameters to align with the true mixture centroids.

Supplementary Material: zip

Primary Area: Theory (e.g., control theory, learning theory, algorithmic game theory)

Submission Number: 23695

Loading