Keywords: Drug discovery, Transformers, Multi-modal, Multi-task, Molecular property prediction
Abstract: We introduce a 1B-parameter transformer model pre-trained from scratch on 2.25T tokens from a massive mixture of datasets centered around drug discovery. These datasets are heterogeneous, coming from dozens of sources and covering 15 data modalities. We demonstrate the model’s capability on various molecular assay prediction tasks, including public benchmarks and internally generated holdouts from real-world drug discovery programs. Following parameter-efficient fine-tuning, the multi-modal transformer excels at multi-task predictions compared to strong molecular property prediction baselines including XGBoost and Chemprop.
Supplementary Material: pdf
Submission Number: 110
Loading