PerturBERT: Learning Gene Co-Variation Embeddings from Perturbation Signatures

Artur Szałata; Russell Littman; Zoe Piran; Fabian J Theis; David Richmond; Jan-Christian Huetter; Alexander P Wu

PerturBERT: Learning Gene Co-Variation Embeddings from Perturbation Signatures

Artur Szałata, Russell Littman, Zoe Piran, Fabian J Theis, David Richmond, Jan-Christian Huetter, Alexander P Wu

Published: 02 Mar 2026, Last Modified: 03 Jun 2026MLGenX 2026 TinypapertrackEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Current foundation models for transcriptomic data are typically trained in a self-supervised manner to predict masked gene expression values within a sample given other genes, thereby learning gene co-variation patterns from observational data. However, many translational applications require understanding how gene expression changes in response to interventions. We introduce PerturBERT, an encoder-only transformer pre-trained with masked-gene modeling on ~1M perturbation signatures across 248 cell lines that learns perturbational co-variance patterns from gene perturbation responses. PerturBERT tokenizes each signature as a set of (downstream gene, response) pairs and produces gene embeddings contextualized by their response to interventions. PerturBERT gene embeddings achieve state-of-the-art results on a gene embedding benchmark and gene dependency prediction. To our knowledge, PerturBERT is the first transformer explicitly pre-trained on gene perturbation responses, providing complementary representations to models trained on observational gene expression profiles.

Submission Number: 46

Loading