Label consistency in overfitted generalized $k$-meansDownload PDF

Published: 09 Nov 2021, Last Modified: 05 May 2023NeurIPS 2021 PosterReaders: Everyone
Keywords: clustering, k-means, label consistency, manifold clustering, overfitting
TL;DR: We provide theoretical guarantees for label consistency in generalized $k$-means problems, with an emphasis on the overfitted case where the number of clusters used by the algorithm is more than the ground truth.
Abstract: We provide theoretical guarantees for label consistency in generalized $k$-means problems, with an emphasis on the overfitted case where the number of clusters used by the algorithm is more than the ground truth. We provide conditions under which the estimated labels are close to a refinement of the true cluster labels. We consider both exact and approximate recovery of the labels. Our results hold for any constant-factor approximation to the $k$-means problem. The results are also model-free and only based on bounds on the maximum or average distance of the data points to the true cluster centers. These centers themselves are loosely defined and can be taken to be any set of points for which the aforementioned distances can be controlled. We show the usefulness of the results with applications to some manifold clustering problems.
Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.
Supplementary Material: pdf
Code: zip
18 Replies

Loading