Scaling Multi-Modal and Multi-Task Transformers for Small Molecule Drug Discovery

David S. Farina Jr; Sai Krishna Sirumalla; Michiel J.M. Niesen; Daniele A. Di Cesare; Felipe Costa Farias; Michael B. O'Connor; Marcelo Gomes Pereira de Lacerda; Orion Walker Dollar; Peter Bygrave; Thomas Dresselhaus; Zhuoran Qiao; Rishi Shah; Jason Swails; Daniel Miles; Oliver Feighan; Stephen Opalenski; Wallace Derricotte; Feizhi Ding; Matthew Welborn; Fred Manby; Thomas Miller

Scaling Multi-Modal and Multi-Task Transformers for Small Molecule Drug Discovery

Published: 24 Sept 2025, Last Modified: 15 Oct 2025NeurIPS2025-AI4Science PosterEveryoneRevisionsBibTeXCC BY 4.0

Track: Track 1: Original Research/Position/Education/Attention Track

Keywords: Multi-modal, Transformer, Drug discovery, Scaling Laws

TL;DR: We introduce Enchant v2, a large-scale multi-modal transformer for predicting molecular, biochemical, and pharmacological properties from heterogeneous biomedical data.

Abstract: We introduce Enchant v2, a large-scale multi-modal transformer for predicting molecular, biochemical, and pharmacological properties from heterogeneous biomedical data. The model addresses a core challenge in drug discovery: generalizing under extreme data sparsity and across incompatible modalities. Diverse inputs including molecular graphs, protein sequences, assay measurements, and free text are represented as unified token sequences processed by a single transformer. Pretraining on a large, curated corpus is followed by parameter-efficient fine-tuning for molecule property prediction. We show that Enchant v2 follows established transformer scaling laws, with performance improving predictably as pre-training compute increases. On public and proprietary benchmarks including drug property prediction and internal pharmacology datasets, it consistently outperforms TxGemma and Enchant v1. Crucially, in real-world applications, Enchant v2 surpasses the current industry standard of in vitro screening: for example, it achieves an AUROC of 0.74 in classifying high versus low in vivo rat clearance, compared to just 0.51 when extrapolating from measured in vitro clearance values. In addition, the model produces calibrated uncertainty estimates that closely track observed hit rates in virtual screening tasks, enabling reliable hit identification and efficient prioritization of compounds in early discovery workflows. These findings suggest that scalable, modality-agnostic transformers can deliver robust generalization and substantial performance gains in real-world low-data drug discovery settings.

Submission Number: 192

Loading