Unified K-Means Clustering with Label-Guided Manifold Learning

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: This paper introduces a unified manifold-enhanced K-means clustering framework, removing centroid initialization, using multiple distances, and maximizing ℓ2,1-norm for balance, showing top performance on diverse datasets.
Abstract: K-Means clustering is a classical and effective unsupervised learning method attributed to its simplicity and efficiency. However, it faces notable challenges, including sensitivity to random initial centroid selection, a limited ability to discover the intrinsic manifold structures within nonlinear datasets, and difficulty in achieving balanced clustering in practical scenarios. To overcome these weaknesses, we introduce a novel framework for K-Means that leverages manifold learning. This approach eliminates the need for centroid calculation and utilizes a cluster indicator matrix to align the manifold structures, thereby enhancing clustering accuracy. Beyond the traditional Euclidean distance, our model incorporates Gaussian kernel distance, K-nearest neighbor distance, and low-pass filtering distance to effectively manage data that is not linearly separable. Furthermore, we introduce a balanced regularizer to achieve balanced clustering results. The detailed experimental results demonstrate the efficacy of our proposed methodology.
Lay Summary: K-Means clustering is a classical and effective unsupervised learning method attributed to its simplicity and efficiency. However, it faces notable challenges, including sensitivity to random initial centroid selection, a limited ability to discover the intrinsic manifold structures within nonlinear datasets, and difficulty in achieving balanced clustering in practical scenarios. To overcome these weaknesses, we introduce a novel framework for K-Means that leverages manifold learning. This approach eliminates the need for centroid calculation and utilizes a cluster indicator matrix to align the manifold structures. The method also introduces different distance metrics to effectively manage data that is not linearly separable. It also achieves balanced clustering with a balanced regularizer, which further improves clustering performance.
Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.
Primary Area: General Machine Learning->Clustering
Keywords: Balanced clustering, unsupervised learning, low-pass filtering distance.
Submission Number: 2499
Loading