IO-K-Means

Shuntai Zhang, Ming Yang, Tao Zhang, Jiayao Wang, Yishu Zhao, Wei Pang, Yizhang Wang

Published: 01 Jan 2026, Last Modified: 15 Jan 2026Advanced Data Mining and Applications. ADMA 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: K-means is a widely used unsupervised learning algorithm, but its clustering performance can be heavily influenced by the choice of initial centroids and easily falls into local optima. To address these limitations, we propose IO-K-means, an improved K-means algorithm based on iterative centroid optimization. First, instead of random initialization, we employ reverse nearest-neighbor relationships (RNNs) to select higher-quality initial centroids, ensuring a more representative starting point. Second, we introduce an iterative refinement mechanism: in each iteration, a novel within-cluster compactness measure identifies which centroids require adjustment, and two operations—interconnection and perturbation—are applied to fine-tune their positions. Through multiple iterations, the algorithm progressively improves centroid placement, ultimately assigning data points to their nearest centroids for final clustering. To verify the proposed IO-K-means, we make experiments on 16 real-world datasets. The experimental results show that IO-K-means outperforms the state-of-the-art (SOTA) extensions of K-means especially including Nie’s work (TKDE2022).

External IDs:doi:10.1007/978-981-95-3462-3_15