Cell-Level Virtual Screening

Published: 28 May 2026, Last Modified: 28 May 2026GenBio 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: virtual screening, multimodal, perturbation modeling, multi-task learning, representation learning, sample-specific, graphical models
TL;DR: We challenge basic assumptions about virtual cell methods, and develop two new benchmarks targeting virtual screening applications. Based on our findings, we introduce a SOTA model for cell-level virtual screening.
Abstract: Virtual screening methods prioritize therapeutic candidates by predicting molecular properties and interactions. However, molecular models are insufficient to predict higher-order effects that arise in real biological systems. This blind spot leads to many late-stage failures in drug discovery. Virtual cells have been posed as a solution to this problem by predicting gene expression responses to drugs, but they remain weakly validated as screening tools; gene expression is only an intermediate in understanding drug success or failure. Despite burgeoning progress in virtual cells, some basic questions remain. Is expression even a good representation of higher-order drug effects? How can virtual cell methods be applied to prioritize therapeutic candidates? Can they be fairly compared against traditional molecular-level screens? We address these questions in a two-pronged approach. First, we curate two benchmarks that directly compare virtual cells against traditional molecular methods on canonical drug discovery tasks. Drug-Disease Bench evaluates a method's ability to prioritize disease indications for drugs with novel target profiles. Drug-Target Bench evaluates a method's ability to reconstruct drug-target interactions from separate perturbation modalities that act on shared mechanisms, bridging the gap between cell-level methods and classic molecular screens. We identify shortcomings of existing virtual cells on these benchmarks, and propose an alternative representation of cell state: gene networks. Inferring post-perturbation gene networks on-demand for unseen drugs requires methods that generalize beyond traditional plug-in network estimators. We develop a scalable differentiable surrogate loss for multivariate Gaussians, which we apply to train a context encoder that maps perturbation metadata to full gene-gene dependency network parameters. The resulting model, CellVS-Net, achieves SOTA on predicting how gene-gene networks restructure under a variety of complex multivariate experimental conditions, including different cell types, small molecules, large molecules, gene knockdowns, and gene overexpressions. When compared to other molecular and cell-level representations of drugs, we find that CellVS-Net achieves SOTA on both virtual screening benchmarks. Overall, CellVS-Net provides the first demonstration that cell-level virtual screening methods are a viable alternative to molecular screening, and associated benchmarks enable future hill-climbing on clinically relevant tasks. We provide source code for models and data curation, as well as public leaderboards.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 141
Loading