NYSTROM SAMPLING DEPENDS ON THE EIGENSPECTRUM SHAPE OF THE DATA

Djallel Bouneffouf

Feb 12, 2018 (modified: Jun 04, 2018) ICLR 2018 Workshop Submission readers: everyone Show Bibtex
  • Abstract: Spectral clustering has shown a superior performance in analyzing the cluster structure. However, its computational complexity limits its application in analyzing large-scale data. To address this problem, many low-rank matrix approximating algorithms are proposed, including the Nystrom method – an approach with proven approximate error bounds. There are several algorithms that provide recipes to construct Nystrom approximations with variable accuracies and computing times. This paper proposes a scalable Nystrom-based clustering algorithm with a new sampling procedure, Centroid Minimum Sum of Squared Similarities (CMS3), and a heuristic on when to use it. Our heuristic depends on the eigenspectrum shape of the dataset, and yields competitive low-rank approximations in test datasets compared to the other state-of-the-art methods.
  • Keywords: Clustering, Nystrom Sampling, landmark point selection

Loading