A general probabilistic framework for clustering individuals and objectsOpen Website

2000 (modified: 16 Jul 2019)KDD 2000Readers: Everyone
Abstract: This paper presents a unifying probabilisti framework for lustering individuals or systems into groups when the available data measurements are not multivariate ve tors of xed dimensionality. For example, one might have data from a set of medi al patients, where for ea h patient one has a set of of observed time-series, ea h time-series of potentially di erent length and di erent sampling rate. We propose a general model-based probabilisti framework for lustering data types of this form whi h are non-ve tor in nature and may vary in size from individual to individual. The Expe tation-Maximization (EM) pro edure for lustering within this framework is dis ussed and we dis uss how it be applied in a general manner to lustering of sequen es, time-series, traje tories, and other non-ve tor data. We show that a number of earlier algorithms an be viewed as spe ial ases within this unifying framework. The paper on ludes with several illustrations of the method, in luding lustering of red blood ell data in a medi al diagnosis ontext, lustering of proteins from urves of gene expression data, and lustering of individuals based on their sequen es of Web navigation. General Terms Clustering, Mixture Models, EM Algorithm
0 Replies

Loading