Track: Tiny paper track (up to 4 pages)
Keywords: perturbation modeling, masked language modeling, gene embeddings
TL;DR: PerturBERT is an encoder-only transformer pre-trained with masked-gene modeling on gene perturbation signatures.
Abstract: Current foundation models for transcriptomic data are typically trained in a self-supervised manner to predict gene expression within a sample given other genes, thereby learning gene co-variation patterns from observational data. However, many translational applications require understanding how gene expression changes in response to interventions. We introduce PerturBERT, an encoder-only transformer pre-trained with masked-gene modeling on approximately 1M perturbation signatures across 248 cell lines that learns perturbational co-variance patterns from gene perturbation responses. PerturBERT tokenizes each signature as a set of (gene, response) pairs and produces gene embeddings contextualized by their response to interventions. PerturBERT gene embeddings achieve state-of-the-art results on a gene embedding benchmark and gene dependency prediction. To our knowledge, PerturBERT is the first transformer explicitly pre-trained on gene perturbation responses, offering representations complementary to models trained on observational gene expression profiles.
AI Policy Confirmation: I confirm that this submission clearly discloses the role of AI systems and human contributors and complies with the ICLR 2026 Policies on Large Language Model Usage and the ICLR Code of Ethics.
Submission Number: 46
Loading