Keywords: unsupervised learning, clustering, weighted k-means, silhouette score
TL;DR: We introduce K-Sil, a silhouette-driven, instance-weighted k-means variant that prioritizes high-silhouette (well-clustered) points and down-weights noisy or borderline ones, yielding higher silhouette scores and more well-separated clusters.
Abstract: Clustering is a fundamental unsupervised learning task, yet widely used algorithms such as k-means remain sensitive to outliers and imbalanced data, often yielding distorted centroids and poor partitions. We propose K-Sil, a silhouette-guided variant of k-means that assigns instance weights proportional to their silhouette scores. This emphasizes well-clustered points while suppressing ambiguous or noisy regions. K-Sil supports both macro- and micro-averaged silhouette aggregation through adaptive weighting schemes, and achieves scalability via efficient sampling and approximation strategies. We establish theoretical guarantees on centroid convergence, and validate the method on synthetic and real-world datasets. Across settings, K-Sil consistently achieves higher silhouette scores than k-means and existing instance-weighted extensions, demonstrating its effectiveness for learning high-quality, well-separated clusters.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 13592
Loading