Connecting Preclinical Models to Patient Outcomes: A Machine Learning Dataset for Predictive Validity in Drug Development
Track: Track 2: Dataset Proposal Competition
Keywords: cancer, colorectal cancer, predictive validity, preclinical model, drug development, translational research, precision medicine
TL;DR: Adding clear measurement of predictive validity to capture how well a preclinical model-based dataset predicts activity in the patient setting will unlock new opportunities in machine learning-enabled drug development
Abstract: Cancer researchers test thousands of potential drugs every year, yet only 5% of those that are selected for clinical testing succeed in the patient setting1. This is fundamentally a prediction problem: we know that initial results in cancer cell lines don’t accurately reflect their efficacy in the clinic, yet we have to use results from various imperfect laboratory models to make expensive decisions about which potential drugs to invest in translating to clinical trials2. Predictive validity measures the accuracy of laboratory models’ results relative to the same intervention’s ultimate performance in the clinic3,4. However, it is not routinely measured in any form of drug development and it is absent from current datasets5–7, both for common laboratory models and for patient data. This absence is expensive, both for drug developers and for machine learning (ML) research approaches, where the lack of clear predictive validity metrics restricts best-in-class ML predictions to the type of data on which they were trained. Here we propose the development of a vertically integrated colorectal cancer (CRC) dataset that characterizes patient samples and preclinical models over time to rigorously measure the predictive validity of each model for different drug perturbations. This dataset, as a common good for academia and industry alike, will enable clearer measurement of predictive value for both wet lab and ML model results, which in turn will empower researchers to develop new types of ML models that predict efficacy across multiple models, investors to assess clinical likelihood of success more rigorously, and clinicians to match patients to treatment regimens most likely to deliver curative results.
Submission Number: 318
Loading