Keywords: virtual screening, multimodal, perturbation modeling, multi-task learning, representation learning, sample-specific, graphical models
TL;DR: We challenge basic assumptions about virtual cell methods, and develop two new benchmarks targeting virtual screening applications. Based on our findings, we introduce a SOTA model for cell-level virtual screening.
Abstract: Virtual screening methods prioritize therapeutic candidates by predicting molecular properties and interactions.
However, molecular models are insufficient to predict higher-order effects that arise in real biological systems.
This blind spot leads to many late-stage failures in drug discovery.
Virtual cells have been posed as a solution to this problem by predicting gene expression responses to drugs, but they remain weakly validated as screening tools; gene expression is only an intermediate in understanding drug success or failure.
Despite burgeoning progress in virtual cells, some basic questions remain.
Is expression even a good representation of higher-order drug effects?
How can virtual cell methods be applied to prioritize therapeutic candidates?
Can they be fairly compared against traditional molecular-level screens?
We address these questions in a two-pronged approach.
First, we curate two benchmarks that directly compare virtual cells against traditional molecular methods on canonical drug discovery tasks.
Drug-Disease Bench evaluates a method's ability to prioritize disease indications for drugs with novel target profiles.
Drug-Target Bench evaluates a method's ability to reconstruct drug-target interactions from separate perturbation modalities that act on shared mechanisms, bridging the gap between cell-level methods and classic molecular screens.
We identify shortcomings of existing virtual cells on these benchmarks, and propose an alternative representation of cell state: gene networks.
Inferring post-perturbation gene networks on-demand for unseen drugs requires methods that generalize beyond traditional plug-in network estimators.
We develop a scalable differentiable surrogate loss for multivariate Gaussians, which we apply to train a context encoder that maps perturbation metadata to full gene-gene dependency network parameters.
The resulting model, CellVS-Net, achieves SOTA on predicting how gene-gene networks restructure under a variety of complex multivariate experimental conditions, including different cell types, small molecules, large molecules, gene knockdowns, and gene overexpressions.
When compared to other molecular and cell-level representations of drugs, we find that CellVS-Net achieves SOTA on both virtual screening benchmarks.
Overall, CellVS-Net provides the first demonstration that cell-level virtual screening methods are a viable alternative to molecular screening, and associated benchmarks enable future hill-climbing on clinically relevant tasks.
We provide source code for models and data curation, as well as public leaderboards.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 141
Loading