CINNAMON: A Convolutional Neural Network Framework for Extracting Regulatory Motif Insights from Epigenomics Data
Keywords: Convolutional Neural Networks, Epigenomics, Explainable AI, Regulatory Motifs
TL;DR: CINNAMON is a convolutional neural network based computational framework that is combined with explainable AI methods to provide insights in regulatory motif relationships using epigenomics data.
Abstract: Mammalian cell development requires differential activation of transcription factors (TFs) to establish lineage discriminating gene expression. While some of them have been characterized as pioneer regulators from previous studies1,2, comprehensive understanding of the dynamic TF interactions during developmental processes remains elusive. In our study we utilized experimental data (ATAC-seq and ChIP-seq among others), that encompass regulatory regions such as enhancers and promoters, where TFs bind and regulate gene expression. The experimentally derived chromatin profile along with the underlying sequence and evolutionary conservation were used as input to an in-house developed Convolutional Neural Network (CNN) model, trained to distinguish between regulatory regions that exhibit a developmental stage specific activity.
Our hypothesis posited that TF motifs, pivotal in specific developmental stages, underlie the key distinguishing features among diverse regulatory elements and developmental stages. Interpreting the first convolutional layers and especially the filter kernel (approximated TF [aTF] motif) comparison with known TF motifs provided us with insights into what the model grasped during training, to associate DNA sequences to distinct developmental stages. Filter weight shuffling was applied to assess the importance of each aTF for the classification task, while in silico mutagenesis was used to evaluate the significance of each position within aTF motifs. The activation profile of aTFs was used as a proxy to calculate both their synergistic or antagonistic action as well as their association with distinct developmental stages.
Our effort provides a novel methodology, for gaining insight into the regulatory dynamics that govern crucial biological processes, that is based on explainable Deep Learning modeling of experimental data.
1 Minderjahn J, Schmidt A, Fuchs A, et al. Mechanisms governing the pioneering and redistribution capabilities of the non-classical pioneer PU.1. Nature Communications volume 11, Article number: 402 (2020).
2 Johnson JL, Georgakilas G, Petrovic J, et al. Lineage-Determining Transcription Factor TCF-1 Initiates the Epigenetic Identity of T Cells. Immunity. 2018;48(2):243-257.e10. doi:10.1016/j.immuni.2018.01.012.
Submission Number: 140
Loading