Additional Submission Instructions: For the camera-ready version, please include the author names and affiliations, funding disclosures, and acknowledgements.
Track: Track 2: Dataset Proposal Competition
Keywords: Machine Learning, AI, Chemistry, Chemoinformatics, Mechanisms, Reaction Prediction, Dataset
Abstract: The lack of openly accessible, well-curated reaction databases remains a major obstacle to data-driven research in chemistry. Many existing chemical datasets are proprietary and/or limited to unbalanced overall transformations that map reactants directly to products without revealing underlying mechanisms, intermediates, or byproducts. As a result, machine learning models trained on such data often act as “black boxes,” predicting products without explaining how or why they form. To address this gap, we present the largest and most comprehensive publicly available dataset of manually curated elementary reaction steps, integrated into a platform that supports continuous curation, search functionality, and community contribution at scale. Our datasets cover polar and radical elementary steps, complete mechanistic pathways, and combinatorially generated mechanisms, with each reaction represented as a balanced, canonicalized SMIRKS string with reactive atom mapping and mechanistic annotations. By making mechanistic reaction data widely available, we aim to enable the development of interpretable and more accurate machine learning models for reaction and pathway prediction.
We make the platform publicly available at https://deeprxn.ics.uci.edu/.
Submission Number: 246
Loading