Gene-Embedding Perturbation Operators for Zero-Shot and Transferable Prediction of Transcriptional Responses
Keywords: single-cell genomics, perturbation prediction, gene expression, CRISPR, Perturb-seq, zero-shot generalization, cross-cell-type transfer, transcriptomics, drug target discovery, scRNA-seq, genetic knockouts, biological foundation models, gene function prediction
TL;DR: We predict transcriptional responses to unseen genetic perturbations by encoding CRISPR knockouts as GRN-embedding-derived operators, enabling zero-shot prediction and cross-cell-type transfer without retraining.
Abstract: Predicting how genetic perturbations alter transcriptional programs is fundamental to understanding gene function, yet existing methods require perturbation-specific training data and cannot generalize to unseen target genes or new cell types. We introduce DYNAMO, a framework that parameterizes perturbation effects through gene embeddings derived from gene regulatory networks (GRNs), enabling prediction for any gene with a network embedding, including genes never perturbed during training. DYNAMO’s key architectural innovation is a frozen-plus-learnable embedding decomposition: pretrained embeddings, including Node2Vec or spectral embeddings, preserve GRN structure for unseen genes, while a zero-initialized learnable component adapts representations for training genes. Each perturbation is encoded as a low-rank operator within a Koopman formalism, and combinatorial perturbations compose via operator products. On K562 Perturb-seq with zero train–test perturbation overlap, DYNAMO achieves a Pearson correlation of predicted expression changes of ρΔ = 0.283, while GEARS crashes and scGen produces anti-correlated predictions (−0.126). The structured operator enables cross-cell-type transfer: a K562-trained model achieves ρΔ = 0.576 on RPE1 without retraining. On Norman combinatorial perturbations, DYNAMO achieves ρΔ = 0.563, outperforming GEARS by 2.7× and scGen by 2.8×. A systematic comparison of six embedding strategies reveals that GRN topology provides a 19% advantage over text, co-expression, and foundation-model embeddings. Ablations show that the structured operator, rather than a direct MLP, is critical for cross-cell-type transfer, retaining 51% of native performance compared with 30% for an unstructured alternative.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 177
Loading