Automatic centroid initialization in k-means using artificial hummingbird algorithm

Published: 01 Jan 2025, Last Modified: 12 Jun 2025Neural Comput. Appl. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: K-means is a widely used technique that heavily relies on the initial cluster centroid location. Poorly chosen centroids can cause the algorithm to get trapped in suboptimal solutions. Additionally, determining the optimal number of clusters for large datasets is computationally expensive. To address these challenges, a recently developed Artificial Hummingbird Algorithm (AHA) is used to initialize cluster centroid locations and automatically determine the best estimate for the number of clusters. AHA simulates the specialized flight skills and intelligent foraging strategies of hummingbirds, striking a fine balance between exploration and exploitation during the search process. Unlike other data clustering approaches that use a fixed threshold in heuristic methods, we propose a dynamic threshold based on the variance of the data with respect to its centroids for activating cluster centroids in AHA. The data are automatically partitioned into k cluster centroids such that cohesion, measured by cluster diameters, and separation, measured by nearest neighbor distance, are optimized. The algorithm is tested on various datasets, including real-world data, fundamental clustering benchmarks, synthetic data, and high-dimensional data. To evaluate performance, metrics such as fitness value, inter-cluster distance, and intra-cluster distance were used. Results indicate that the proposed method ranked first and achieved superior clustering performance compared to state-of-the-art algorithms.
Loading