PertEval-scFM: Benchmarking Single-Cell Foundation Models for Perturbation Effect Prediction

Aaron Wenteler; Martina Occhetta; Nikhil Branson; Victor Curean; Magdalena Huebner; William Dee; William Connell; Siu Pui Chung; Alex Hawkins-Hooker; Yasha Ektefaie; César Miguel Valdez Córdova; Amaya Gallagher-Syed

PertEval-scFM: Benchmarking Single-Cell Foundation Models for Perturbation Effect Prediction

Aaron Wenteler, Martina Occhetta, Nikhil Branson, Victor Curean, Magdalena Huebner, William Dee, William Connell, Siu Pui Chung, Alex Hawkins-Hooker, Yasha Ektefaie, César Miguel Valdez Córdova, Amaya Gallagher-Syed

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: Using PertEval-scFM, we benchmark zero-shot single-cell foundation model (scFM) embeddings and find that they do not outperform simpler baselines for perturbation effect prediction.

Abstract:

In silico modeling of transcriptional responses to perturbations is crucial for advancing our understanding of cellular processes and disease mechanisms. We present PertEval-scFM, a standardized framework designed to evaluate models for perturbation effect prediction. We apply PertEval-scFM to benchmark zero-shot single-cell foundation model (scFM) embeddings against baseline models to assess whether these contextualized representations enhance perturbation effect prediction. Our results show that scFM embeddings offer limited improvement over simple baseline models in the zero-shot setting, particularly under distribution shift. Overall, this study provides a systematic evaluation of zero-shot scFM embeddings for perturbation effect prediction, highlighting the challenges of this task and the limitations of current-generation scFMs. Our findings underscore the need for specialized models and high-quality datasets that capture a broader range of cellular states. Source code and documentation can be found at: https://github.com/aaronwtr/PertEval.

Lay Summary:

To develop new treatments, scientists study how cells respond when specific genes are changed. This is called a genetic perturbation experiment. Measuring these effects one by one is costly and slow, so researchers are exploring whether AI can predict them instead.

Large machine learning models called single-cell foundation models (scFMs) are trained on massive datasets of RNA data. The hope is that they learn general principles of cell behavior to make informative predictions about cellular states.

Our work introduces PertEval, a benchmark that tests whether the zero-shot embeddings produced by scFMs contain meaningful information for predicting perturbation effects. Given a pair of cells — one perturbed and one unperturbed — a simple model uses representations produced by the scFMs to predict how the cell changed.

We evaluate five leading scFMs and find that, in the zero-shot setting, they often fail to accurately predict perturbation effects. Most models do not outperform simple baselines, particularly when evaluated on strong or atypical perturbations. PertEval offers a standard and rigorous way to test how well these models perform, highlighting the limitations of current approaches and helping guide the development of more robust tools.

Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.

Link To Code: https://github.com/aaronwtr/PertEval

Primary Area: Applications->Health / Medicine

Keywords: benchmark, single-cell biology, perturbation effect prediction

Submission Number: 11903

Loading