Abstract: Clustering is an unsupervised machine learning task that aims to discover natural groups in the given dataset. K-mode algorithms, which are adaptions of K-mean algorithms for continuous data, are among the most popular algorithms for discovering clusters in categorical data. In this paper, we present some results on how to accelerate them using the triangle inequality, while still always computing exactly the same result as the original K-mode algorithms. We also provide some empirical evidence to illustrate the potential gains provided by leveraging the triangle inequality. Finally, we envision future work aimed at providing a comprehensive understanding of the use of triangle inequality in accelerating other clustering algorithms for categorical data.
External IDs:dblp:conf/sum/NguyenMH24
Loading