Clustering Distributed Homogeneous Datasets

Srinivasan Parthasarathy, Mitsunori Ogihara

Published: 2000, Last Modified: 10 Oct 2023PKDD 2000Readers: Everyone

Abstract: In this paper we present an elegant and effective algorithm for measuring the similarity between homogeneous datasets to enable clustering. Once similar datasets are clustered, each cluster can be independently mined to generate the appropriate rules for a given cluster. The algorithm presented is efficient in storage and scale, has the ability to adjust to time constraints, and can provide the user with likely causes of similarity or dissimilarity. The proposed similarity measure is evaluated and validated on real datasets from the Census Bureau, Reuters, and synthetic datasets fromIBM.

0 Replies