A general framework for predicting the transcriptomic consequences of non-coding variation and small molecules

Moustafa Abdalla, Mohamed Abdalla

Published: 2022, Last Modified: 17 Oct 2023PLoS Comput. Biol. 2022Readers: Everyone

Abstract: Author summary High-throughput assays are the cornerstone of modern drug discovery and a useful tool to translating the hundreds of genetic discoveries associated with human traits and disease into functional understanding. All high-throughput assays can be described as empirical assessments of the activity of biological entities (e.g., genetic variation, DNA sequences, small molecules) by a standardized output, usually in the form of optically detectable labels (i.e., reporters), or more rarely, using (scalable) high-dimensional measurements (e.g., L1000, RNA-seq). Here, we introduce a modular and readily-extensible computational framework, called peaBrain, that leverages convolutional neural network architecture to enable in silico recapitulation of certain features of these high-throughput assays. We show that peaBrain can predict the expression of genes in a tissue-specific manner and outperforms regularized linear models in predicting the consequences of individual genotype variation. We further highlight the utility of the framework in predicting transcriptomic impact of small molecules and shRNA (on par with in vitro experimental replication of external test sets), explore how peaBrain can be used to model difficult-to-study processes (such as neural induction), and finally, identify putatively functional eQTLs that are missed by high-throughput experimental approaches.

0 Replies