Keywords: Causal Representation Learning, scRNA-seq, VAE, Generative Model, Perturb-seq, Interpretability
Abstract: Experiments involving the perturbation of individual cells are central to understanding cellular mechanisms and can accelerate drug discovery.
Causal representation learning (CRL) allows us to uncover the latent factors that regulate biological systems and predict the impact of novel perturbations.
Unfortunately, existing methods fail to address intervention spillover in a closed-world setting where intervention targets are known a priori, such as in Perturb-seq experiments, due to their reliance on dense encoders.
Furthermore, incorporating curated biological pathways into the model imposes a confirmatory bias, forcing it to explain the data through preexisting pathways and reducing the set of hypotheses the model can explore, while discarding novel signals that lie outside the annotated pathways.
In this work, we introduce RAPTORGraph, a $\beta$-VAE with a GraphPathway encoder that explicitly models complex gene-to-gene interactions within learned pathways.
Moreover, our model's preconditioning isolates the influence of perturbed genes, yielding clean, single-node latent interventions required for identifiable causal discovery and eliminating spillover.
Finally, we train the model on data preprocessed with optimal-transport alignment, which guarantees a well-defined mapping between control and perturbed samples and further stabilizes the learned latent representations.
We demonstrate that RAPTORGraph improves state-of-the-art performance on downstream analyses of unseen perturbations, such as non-additive interactions, while outperforming other approaches on objective metrics, such as MSE and MK-MMD.
The code will be made publicly available upon publication of this paper.
Supplementary Material: zip
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 11570
Loading