ChIP-GMM: A Gaussian Mixture Model for Inferring Binding Regions in ChIP-seq Profiles

Published: 04 Mar 2017, Last Modified: 11 Feb 2024BICOBEveryoneCC BY 4.0
Abstract: Chromatin immunoprecipitation (ChIP), followed by high-throughput DNA sequencing (ChIP-seq), enables genome-wide mapping of transcription-factor binding sites (TFBS). Several transcription factors (TFs) have been known to be able to differentiate tumor sub-types in diseases like cancer. For instance, the Luminal A and Luminal B sub-types of breast cancer tumors are high in estrogen receptor (ER) while human epidermal growth factor receptor 2 (HER2) tumors are high in HER2 protein. The accurate mapping of the DNA-protein loci is important in determining the causality of epigenetic regulation of gene expression under both normal and disease conditions in order to promote the development of targeted drug therapy. In this paper, we leverage the popular variational Bayes framework for Gaussian mixture models to demonstrate its effectiveness in identifying transcription-factor binding sites (TFBS) and common regions co-regulated by multiple TFs. We show that our method performs favorably when compared to existing peak calling and clustering methods. Our proposed method can both be used as a peak calling method as well as clustering co-regulated genomic regions acted upon by multiple TFs.
Loading