Semi-Parametric Model-Based Clustering for DNA Microarray Data

Published: 2006, Last Modified: 10 Oct 2024ICPR (3) 2006EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Various clustering methods have been proposed for the analysis of gene expression data, but conventional clustering algorithms have several critical limitations; how to set parameters such as number of clusters, initial cluster centers, and so on. In this paper, we propose a semi-parametric model-based clustering algorithm in which the underlying model is a mixture of Gaussian. Each gene expression data builds a Gaussian kernel, and the uncertainty of microarray data is naturally integrated in the data representation. Our algorithm provides a principled method to automatically determine parameters - number of components in the mixture, mean, covariance, and weight of each Gaussian - by mean-shift procedure [2] and curvature fitting. After the initialization, Expectation Maximization (EM) algorithm is employed for clustering to achieve Maximum Likelihood (ML). The performance of our algorithm is compared with standard EM algorithm using real data as well as synthetic data.
Loading