Scaling High-Throughput Experimentation Unlocks Robust Reaction-Outcome Prediction

Michał Sadowski; Lukasz Sztukiewicz; Maria Wyrzykowska; Tadija Radusinović; Piotr Byrski; Paweł Włodarczyk-Pruszyński; Bartosz Matysiak; Jan Kulczycki; Filip Ulatowski; Ruard van Workum; Pawel Dabrowski-Tumanski; Paulina Wach; Filip Chmielewski; Jan Rzymkowski; Mateusz Bruno-Kamiński; Jan Busz; Artur Chołuj; Mateja Duda; Tomasz Dybowski; Marco Farinone; Tomasz Jeliński; Alicja Karczewska; Paweł Kowalczyk; Marek Pietrzak; Łukasz Szczupak; Aleksander Szkółka; Grzegorz Wojciechowski; Stanislaw Kamil Jastrzebski

Scaling High-Throughput Experimentation Unlocks Robust Reaction-Outcome Prediction

Published: 24 Sept 2025, Last Modified: 15 Oct 2025NeurIPS2025-AI4Science PosterEveryoneRevisionsBibTeXCC BY 4.0

Track: Track 1: Original Research/Position/Education/Attention Track

Keywords: Deep learning, organic chemistry, high throughput experimentation

TL;DR: Scaling high-throughput experimentation unlocks robust reaction-outcome prediction

Abstract: Organic chemistry underpins small-molecule drug discovery, yet—unlike structural biology—it lacks large, unbiased datasets for training broadly generalizable models. We report the largest microliter-scale high-throughput experimentation (HTE) campaign to date: $200{,}000$ reactions spanning three workhorse classes (Amide Coupling, Suzuki Coupling, Buchwald–Hartwig Coupling) involving $30{,}000$ products—over $4\times$ larger than the largest publicly disclosed dataset to date. This scale and diversity enable reaction-outcome predictors that generalize to unseen substrates. We introduce UniReact, a molecule-attention Transformer built on pretrained molecular encoders. Across the three reaction classes, our models achieve PR-AUC $2$--$3\times$ over random and ROC-AUC in the $70$--$86\%$ range. We further establish scaling laws for reaction-outcome prediction spanning three orders of magnitude of HTE data, and for one class up to $100{,}000$ reactions—\emph{to our knowledge}, the broadest HTE scaling study to date. In a human study on Suzuki coupling prioritization, our models outperform PhD-level chemists (precision $87.1\%$ at $50\%$ recall vs.~$60.8\%$). Finally, we show the first, to our best knowledge, demonstration of zero-shot transfer to an external HTE dataset. Taken together, these results support scaled HTE as a viable path to broadly applicable predictors of chemical reactivity that surpass human intuition and ultimately help discover novel chemistry.

Submission Number: 253

Loading