Leveraging Pre-Trained LMs for Rapid and Accurate Structure Elucidation from 2D NMR Data

Published: 08 Oct 2024, Last Modified: 03 Nov 2024AI4Mat-NeurIPS-2024EveryoneRevisionsBibTeXCC BY 4.0
Submission Track: LLMs for Materials Science - Short Paper
Submission Category: AI-Guided Design
Keywords: NMR, 2D NMR, Transformers, Fine-tuning LLMs, Automated Structure Elucidation
TL;DR: We leveraged pre-trained T5 for rapid and accurate structure elucidation of SMILES strings from 2D NMR data and formula, achieving state of the art results.
Abstract: Molecular structure elucidation from NMR data is a crucial process in chemistry, particularly for applications on small and medium molecules in materials science. Despite advances in computational methods, traditional approaches remain time-consuming and data-intensive, necessitating the exploration of more efficient and automated solutions. We propose a novel application of a pretrained T5 transformer model for structure elucidation using 2D NMR data, marking the first instance of such an approach with experimental data. Our method generates SMILES strings representing molecular structures by conditioning on both HSQC peaks and the molecular formula, achieving a 74% accuracy rate. This surpasses the previous state-of-the-art achieved with simulated data. By leveraging a pretrained model, our approach requires significantly less data and compute. To our knowledge, this work is the first to apply LMs to automated structure elucidation on 2D NMR spectra, particularly on experimental data.
AI4Mat Journal Track: Yes
Submission Number: 41
Loading