Rethinking Perturbation Prediction Baselines

Published: 02 Mar 2026, Last Modified: 02 Mar 2026MLGenX 2026 TinypapertrackEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Predicting cellular responses to genetic perturbations is central to understanding biology and unlocking more efficient drug and genetic therapy discovery. Recent approaches leverage large language models and deep learning for this task, yet simple baselines for predicting categorical outcomes—such as whether a gene is differentially expressed or up- or down-regulated—remain underexplored. We evaluate two simple baselines on Perturb-seq screens from four cell lines: a gene-based majority vote and an embedding-based $k$-nearest neighbors classifier. On curated benchmarks, majority vote alone achieves accuracies of 0.62--0.80, but collapses on full, unfiltered data, exposing how dataset curation can inflate model performance. On the same unfiltered data, a nearest-neighbor classifier matches LLM-based methods and remains competitive with state-of-the-art deep generative models in cross-cell-line transfer tasks. These results highlight the need for stronger baselines and for directly modeling categorical perturbation outcomes.
Track: Tiny paper track (up to 4 pages)
Keywords: perturbation prediction, differential expression, Perturb-seq, AI benchmarking
TLDR: Nearest-neighbor baselines match LLM-based methods on perturbation outcome prediction, suggesting the field needs stronger baselines before scaling complexity.
AI Policy Confirmation: I confirm that this submission clearly discloses the role of AI systems and human contributors and complies with the ICLR 2026 Policies on Large Language Model Usage and the ICLR Code of Ethics.
Submission Number: 80
Loading