Maximum certainty data partitioning

Stephen J. Roberts, Richard M. Everson, Iead Rezek

Published: 2000, Last Modified: 14 May 2025Pattern Recognit. 2000EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Problems in data analysis often require the unsupervised partitioning of a dataset into clusters. Many methods exist for such partitioning but most have the weakness of being model-based (most assuming hyper-ellipsoidal clusters) or computationally infeasible in anything more than a three-dimensional data space. We re-consider the notion of cluster analysis in information-theoretic terms and show that minimisation of partition entropy can be used to estimate the number and structure of probable data generators.