Abstract: Author summary Transcriptome-wide measurement of gene expression dynamics can reveal regulatory mechanisms that control how cells respond to changes in the environment. Such measurements may identify hundreds to thousands of responsive genes. Clustering genes with similar dynamics reveals a smaller set of response types that can then be explored and analyzed for distinct functions. Two challenges in clustering time series gene expression data are selecting the number of clusters and modeling dependencies in gene expression levels between time points. We present a methodology, DPGP, in which a Dirichlet process clusters the trajectories of gene expression levels across time, where the trajectories are modeled using a Gaussian process. We demonstrate the performance of DPGP compared to state-of-the-art time series clustering methods across a variety of simulated data. We apply DPGP to published microbial expression data and find that it recapitulates known expression regulation with minimal user input. We then use DPGP to identify novel human gene expression responses to the widely-prescribed synthetic glucocorticoid hormone dexamethasone. We find distinct clusters of responsive transcripts that are validated by considering between-cluster differences in transcription factor binding and histone modifications. These results demonstrate that DPGP can be used for exploratory data analysis of gene expression time series to reveal novel insights into biomedically important gene regulatory processes.
0 Replies
Loading