Keywords: Selection bias; Causal discovery; latent confounders; Gene regulatory newwork inference
Abstract: Gene regulatory network inference (GRNI) aims to discover how genes causally regulate each other from gene expression data. It is well-known that statistical dependencies in observed data do not necessarily imply causation, as spurious dependencies may arise from *latent confounders*, such as non-coding RNAs. Numerous GRNI methods have thus been proposed to address this confounding issue. However, dependencies may also result from *selection*—only cells satisfying certain survival or inclusion criteria are observed—while these selection-induced spurious dependencies are frequently overlooked in gene expression data analyses. In this work, we show that such selection is ubiquitous and, when ignored or conflated with true regulations, can lead to flawed causal interpretation and misguided intervention recommendations. To address this challenge, a fundamental question arises: can we distinguish dependencies due to regulation, confounding, and crucially, selection? We show that gene perturbations offer a simple yet effective answer: selection-induced dependencies are *symmetric* under perturbation, while those from regulation or confounding are not. Building on this motivation, we propose GISL (Gene regulatory network Inference in the presence of Selection bias and Latent confounders), a principled algorithm that leverages perturbation data to uncover both true gene regulatory relations and non-regulatory mechanisms of selection and confounding up to the equivalence class. Experiments on synthetic and real-world gene expression data demonstrate the effectiveness of our method.
Supplementary Material: zip
Primary Area: Machine learning for sciences (e.g. climate, health, life sciences, physics, social sciences)
Submission Number: 17977
Loading