Transcriptome-wide root causal inference

Eric V. Strobl, Donna K. Slonim, Eric R. Gamazon

Published: 02 Sept 2025, Last Modified: 26 Feb 2026PLOS Computational BiologyEveryoneRevisionsCC BY-SA 4.0
Abstract: Author summary Many diseases progress through causal chains. The earliest step detectable in gene expression is a small set of root causal genes: expression levels that change first after genetic or non-genetic triggers. Because gene expression is relatively easy to perturb, focusing on these early changes offers a tractable route to stopping disease with a sparse set of interventions. Yet most existing tools either require expensive perturbation screens or fail to distinguish true early causes from downstream consequences. Transcriptome-Wide Root Causal Inference (TWRCI) uses widely available genotype data and bulk RNA-seq to identify these first expression events and quantify their patient-specific effects. TWRCI assigns each genetic variant to the single target it most directly influences—either a gene or the disease outcome—via a head-to-head prediction test, reconstructs the causal chain among genes, and estimates each gene’s patient-specific root causal effect, integrating genetic and non-genetic drivers into an interpretable effect size. In simulations and two diseases, TWRCI outperformed alternatives, recovered compact sets of early-acting genes consistent with known biology, detected variants that act directly on disease outside expression, and replicated across cohorts. Most variation in root causal effects was non-genetic, pointing to environmental triggers.
Loading