PD-scWorld: Pathway-Guided Disentanglement for Single-Cell Perturbation World Models

Azmine Toushik Wasi

PD-scWorld: Pathway-Guided Disentanglement for Single-Cell Perturbation World Models

Azmine Toushik Wasi

Published: 02 Mar 2026, Last Modified: 10 Mar 2026Gen² 2026 PosterEveryoneRevisionsCC BY 4.0

Track: Full / long paper (5-8 pages)

Keywords: single-cell perturbation modeling, latent world models, pathway-guided disentanglement, causal representation learning, virtual cell simulation

TL;DR: Pathway-guided world model learns interpretable, intervention-conditioned latent dynamics for single-cell perturbations, enabling accurate prediction, disentanglement, and counterfactual reasoning.

Abstract: Disentangling the latent factors that drive cellular responses remains challenging for generative and predictive models of single-cell data, particularly under perturbations where multiple biological programs co-activate. We propose PD-scWorld, an intervention-aware latent world model in which each latent factor $z_k$ is encouraged to respond $selectively$ to specific perturbations and covariates, using only weak biological supervision in the form of pathway tags for perturbed genes rather than full factor labels. Given paired pre/post states, the model learns action-conditioned transitions $z' = T_\psi(z, a)$ while constraining the perturbation-induced change $\Delta z = z' - z$ to be group-sparse across latent dimensions associated with the pathway of ($a$). Concretely, we introduce a pathway-conditional regularizer that penalizes dispersion of $\Delta z_k$ outside the designated latent group, combining group sparsity with a variance-based term $\sum_k \mathrm{Var}(\Delta z_k \mid a)$ to localize consistent effects and suppress entangled drift. This yields latents that align with known biological programs while retaining predictive flexibility for unseen perturbations. We evaluate on Perturb-seq and related CRISPR single-cell screens using gene-to-pathway mappings and cell-cycle annotations, measuring (i) mutual information between latents and covariates, (ii) recovery of pathway-specific responses, and (iii) counterfactual consistency under targeted rollouts. Across datasets, the proposed model produces cleaner factorization and more interpretable perturbation mechanisms than $\beta$-VAE and unstructured latent dynamics baselines, while improving accuracy of perturbation effect prediction and robustness to covariate shifts.

Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.

Submission Number: 70

Loading