Keywords: Perturbation Screens, Peturb-seq, Virtual Cell, AI for Biology
TL;DR: Nearest-neighbor baselines match LLM-based methods on perturbation outcome prediction, suggesting the field needs stronger baselines before scaling complexity.
Abstract: Predicting cellular responses to genetic perturbations is central to understanding biology and unlocking more efficient drug and genetic therapy discovery. Recent approaches leverage large language models and deep learning for this task, yet simple baselines for predicting categorical outcomes—such as whether a gene is differentially expressed or up- or down-regulated—remain underexplored. We evaluate two simple baselines on Perturb-seq screens from four cell lines: a gene-based majority vote and an embedding-based $k$-nearest neighbors classifier. On curated benchmarks, majority vote alone achieves accuracies of 0.62--0.80, but collapses on full, unfiltered data, exposing how dataset curation can inflate model performance. On the same unfiltered data, a nearest-neighbor classifier matches LLM-based methods and remains competitive with state-of-the-art deep generative models in cross-cell-line transfer tasks. These results highlight the need for stronger baselines and for directly modeling categorical perturbation outcomes.
Submission Number: 121
Loading