An Algorithm for Clustering Categorical Data With Set-Valued FeaturesDownload PDFOpen Website

Published: 2018, Last Modified: 15 May 2023IEEE Trans. Neural Networks Learn. Syst. 2018Readers: Everyone
Abstract: In data mining, objects are often represented by a set of features, where each feature of an object has only one value. However, in reality, some features can take on multiple values, for instance, a person with several job titles, hobbies, and email addresses. These features can be referred to as set-valued features and are often treated with dummy features when using existing data mining algorithms to analyze data with set-valued features. In this paper, we propose an SV-k-modes algorithm that clusters categorical data with set-valued features. In this algorithm, a distance function is defined between two objects with set-valued features, and a set-valued mode representation of cluster centers is proposed. We develop a heuristic method to update cluster centers in the iterative clustering process and an initialization algorithm to select the initial cluster centers. The convergence and complexity of the SV-k-modes algorithm are analyzed. Experiments are conducted on both synthetic data and real data from five different applications. The experimental results have shown that the SV-k-modes algorithm performs better when clustering real data than do three other categorical clustering algorithms and that the algorithm is scalable to large data.
0 Replies

Loading