TL;DR: InfoSEM is a VAE-based generative model for GRN inference that incorporates informative priors on the adjacency matrix.
Abstract: Inferring Gene Regulatory Networks (GRNs) from gene expression data is crucial for understanding biological processes. While supervised models are reported to achieve high performance for this task, they rely on costly ground truth (GT) labels and risk learning gene-specific biases—such as class imbalances of GT interactions—rather than true regulatory mechanisms. To address these issues, we introduce InfoSEM, an unsupervised generative model that leverages textual gene embeddings as informative priors, improving GRN inference without GT labels. InfoSEM can also integrate GT labels as an additional prior when available, avoiding biases and further enhancing performance. Additionally, we propose a biologically motivated benchmarking framework that better reflects real-world applications such as biomarker discovery and reveals learned biases of existing supervised methods. InfoSEM outperforms existing models by 38.5% across four datasets using textual embeddings prior and further boosts performance by 11.1% when integrating labeled data as priors.
Lay Summary: To understand how our cells work, scientists study how genes turn each other on or off. New biotechnologies let us look at gene activity in individual cells, but the data is messy and hard to analyze. Current computational models that do this well often depend on expensive lab results and may give misleading answers by relying too much on patterns in the data that don’t reflect real biology. We created a new AI tool called InfoSEM that learns how genes interact by combining generative models with existing knowledge about what genes do—without needing costly lab results. When available, InfoSEM can still use lab data to improve its accuracy, but in a way that avoids common pitfalls. We also designed a new way to test these tools that better matches real scientific problems. InfoSEM gives more accurate results than other tools, even those that use expensive lab data. It helps scientists find real gene interactions more reliably, which makes it a powerful and practical tool for studying health, disease, and medicine.
Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.
Primary Area: Applications->Health / Medicine
Keywords: Gene Regulatory Networks inference, Informative priors, scRNA-seq, Variational Autoencoder
Submission Number: 6921
Loading