On the role of drug representations in single-cell perturbation modeling

Published: 02 Mar 2026, Last Modified: 10 Mar 2026Gen² 2026 PosterEveryoneRevisionsCC BY 4.0
Track: Full / long paper (5-8 pages)
Keywords: drug representations, single-cell transcriptomics, large-scale perturbation data
Abstract: Predicting cellular responses to small-molecule perturbations is a central challenge in computational biology and drug discovery. Although recent single-cell foundation models learn rich representations of cellular state from large transcriptomic datasets, their performance on drug perturbation prediction remains limited. This raises an important question as to whether current shortcomings arise from how drugs are represented, how cells are represented, or how the two are coupled. We leverage the Tahoe-100M dataset, which contains single-cell perturbation screens across approximately 50 cell lines and 350 drugs, to study this question at scale. We construct biological-response–derived drug embeddings (BiRD embeddings) from transcriptional response similarities and show that they capture biologically relevant variation not captured by chemical-structure descriptors or small-molecule foundation models such as ADMET-AI. We then introduce an optimal transport–based objective to align single-cell foundation model representations with the BiRD space. After fine-tuning, models achieve large gains on a perturbation-retrieval task, reaching $0.8–0.9$ AUC compared to $0.5–0.6$ AUC with ADMET-based embeddings, and generalize to unseen cell lines. Together, these results highlight the importance of grounding drug representations in biological response data for accurate and transferable perturbation modeling.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 43
Loading