Submission Track: Short Paper
Submission Category: Automated Material Characterization
Keywords: molecular elucidation, IR, NMR, LLM, vision models, SELFIES, molecules
TL;DR: We propose a multi-modal approach for the spectra-to-molecule task, which decodes a molecular structutre from IR and NMR spectra.
Abstract: Molecular structure elucidation is a crucial but fundamentally challenging step in the characterization of materials given the large number of possible structures. Here, we introduce Spectro, an innovative multi-modal approach for molecular elucidation that combines $^{13}\ce{C}$ and $^{1}\ce{H}$ NMR data with IR. Spectro translates the embedded representations of the spectra into molecular structures using the SELFIES notation. We employed a vision model for the embedded representation of the IR data, which was pretrained to detect relevant functional group peaks in the IR spectra achieving an F1 score of 91\%. For NMR data, we utilized LLM2Vec, treating the NMR spectra as text.
This integration of multiple spectroscopic techniques allows Spectro to achieve an overall test accuracy of 93\% when trained jointly with the vision model for the IR spectra, and 82\% when trained with fixed embeddings. Our approach demonstrates the potential of multi-modal learning in tackling complex molecular characterization tasks.
AI4Mat Journal Track: Yes
Submission Number: 78
Loading