Keywords: Bioinformatics, Causal representation learning, Single cell perturbation, Interpretability
Abstract: Predicting cellular responses to single/combinatorial gene perturbations is a central challenge in functional genomics. A critical limitation of current models is their inability, both theoretically and methodologically, to disentangle perturbation-induced effects from the pervasive background cellular transcriptional programs that remain invariant to perturbations but dominate observed gene expression patterns. To address this, we propose a latent variable generative model that explicitly partitions latent space into an variant subspace where a latent causal model is employed to capture perturbations, and an invariant subspace capturing unperturbed cellular programs. We establish a principled foundation for disentangling these two subspaces, and identifying the latent causal model, by differentiability analysis. We then translate our theoretical findings into a practical method that more accurately predicts perturbation effects, supported by the theoretical guarantees. On both simulated and large-scale genetic perturbation benchmarks, the proposed method achieves state-of-the-art accuracy in predicting cellular responses to unseen combinations, significantly outperforming existing methods. Crucially, by disentangling unperturbed cellular programs from perturbation-induced effects, our method prevents the latter from being confounded or absorbed into the dominant invariant patterns. This separation allows the true causal impact of perturbations to be isolated and reliably estimated, thereby enabling accurate prediction of unseen combinatorial gene perturbations at the single-cell level.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 2315
Loading