Abstract: Previous work for finding patterns only focuses on grouping objects under the same subset of dimensions. Thus, an important bio-interesting pattern, i.e. time-shifting, will be ignored during the analysis of time series gene expression data. In this paper, we propose a new definition of coherent cluster for time series gene expression data called ts-cluster. The proposed model allows (1) the expression profiles of genes in a cluster to be coherent on different subsets of dimensions, i.e. these genes follow a certain time-shifting relationship, and (2) relative expression magnitude is taken into consideration instead of absolute one, which can tolerate the negative impact induced by “noise”. This work is missed by previous research, which facilitates the study of regulatory relationships between genes. A novel algorithm is also presented and implemented to mine all the significant ts-clusters. Results experimented on both synthetic and real datasets show the ts-cluster algorithm is able to efficiently detect a significant amount of clusters missed by previous model, and these clusters are potentially of high biological significance.
0 Replies
Loading