Label consistency in overfitted generalized $k$-meansDownload PDF

21 May 2021, 20:45 (edited 21 Jan 2022)NeurIPS 2021 PosterReaders: Everyone
  • Keywords: clustering, k-means, label consistency, manifold clustering, overfitting
  • TL;DR: We provide theoretical guarantees for label consistency in generalized $k$-means problems, with an emphasis on the overfitted case where the number of clusters used by the algorithm is more than the ground truth.
  • Abstract: We provide theoretical guarantees for label consistency in generalized $k$-means problems, with an emphasis on the overfitted case where the number of clusters used by the algorithm is more than the ground truth. We provide conditions under which the estimated labels are close to a refinement of the true cluster labels. We consider both exact and approximate recovery of the labels. Our results hold for any constant-factor approximation to the $k$-means problem. The results are also model-free and only based on bounds on the maximum or average distance of the data points to the true cluster centers. These centers themselves are loosely defined and can be taken to be any set of points for which the aforementioned distances can be controlled. We show the usefulness of the results with applications to some manifold clustering problems.
  • Supplementary Material: pdf
  • Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.
  • Code: zip
18 Replies

Loading