everyone
since 18 Jun 2025">EveryoneRevisionsBibTeXCC BY 4.0
In silico modeling of transcriptional responses to perturbations is crucial for advancing our understanding of cellular processes and disease mechanisms. We present PertEval-scFM, a standardized framework designed to evaluate models for perturbation effect prediction. We apply PertEval-scFM to benchmark zero-shot single-cell foundation model (scFM) embeddings against baseline models to assess whether these contextualized representations enhance perturbation effect prediction. Our results show that scFM embeddings offer limited improvement over simple baseline models in the zero-shot setting, particularly under distribution shift. Overall, this study provides a systematic evaluation of zero-shot scFM embeddings for perturbation effect prediction, highlighting the challenges of this task and the limitations of current-generation scFMs. Our findings underscore the need for specialized models and high-quality datasets that capture a broader range of cellular states. Source code and documentation can be found at: https://github.com/aaronwtr/PertEval.
To develop new treatments, scientists study how cells respond when specific genes are changed. This is called a genetic perturbation experiment. Measuring these effects one by one is costly and slow, so researchers are exploring whether AI can predict them instead.
Large machine learning models called single-cell foundation models (scFMs) are trained on massive datasets of RNA data. The hope is that they learn general principles of cell behavior to make informative predictions about cellular states.
Our work introduces PertEval, a benchmark that tests whether the zero-shot embeddings produced by scFMs contain meaningful information for predicting perturbation effects. Given a pair of cells — one perturbed and one unperturbed — a simple model uses representations produced by the scFMs to predict how the cell changed.
We evaluate five leading scFMs and find that, in the zero-shot setting, they often fail to accurately predict perturbation effects. Most models do not outperform simple baselines, particularly when evaluated on strong or atypical perturbations. PertEval offers a standard and rigorous way to test how well these models perform, highlighting the limitations of current approaches and helping guide the development of more robust tools.