Signals in the Cells: Multimodal and Contextualized Machine Learning Foundations for Therapeutics

Published: 13 Oct 2024, Last Modified: 01 Dec 2024AIDrugX SpotlightEveryoneRevisionsBibTeXCC BY 4.0
Keywords: machine learning in therapeutics, datasets and benchmarks for ML therapeutics, biomedical artificial intelligence
TL;DR: Therapeutics Commons (TDC-2) presents a collection of datasets, tools, models, and benchmarks integrating cell-type-specific contextual features with ML tasks across the range of therapeutics.
Abstract: Drug discovery AI datasets and benchmarks have not traditionally included single-cell analysis biomarkers. While benchmarking efforts in single-cell analysis have recently released collections of single-cell tasks, they have yet to comprehensively release datasets, models, and benchmarks that integrate a broad range of therapeutic discovery tasks with cell-type-specific biomarkers. Therapeutics Commons (TDC-2) presents datasets, tools, models, and benchmarks integrating cell-type-specific contextual features with ML tasks across therapeutics. We present four tasks for contextual learning at single-cell resolution: drug-target nomination, genetic perturbation response prediction, chemical perturbation response prediction, and protein-peptide interaction prediction. We introduce datasets, models, and benchmarks for these four tasks. Finally, we detail the advancements and challenges in machine learning and biology that drove the implementation of TDC-2 and how they are reflected in its architecture, datasets and benchmarks, and foundation model tooling.
Submission Number: 40
Loading