Abstract: Problems in data analysis often require the unsupervised partitioning of a dataset into clusters. Many methods exist for such partitioning but most have the weakness of being model-based (most assuming hyper-ellipsoidal clusters) or computationally infeasible in anything more than a three-dimensional data space. We re-consider the notion of cluster analysis in information-theoretic terms and show that minimisation of partition entropy can be used to estimate the number and structure of probable data generators.
Loading