NMIRacle: Multi-modal Generative Molecular Elucidation from IR and NMR Spectra

Published: 02 Mar 2026, Last Modified: 02 Mar 2026AI4Mat-ICLR-2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: AI4Science, Molecular structure elucidation, Spectroscopy
TL;DR: NMIRacle is a two-stage generative model that uses count-aware fragment representations and multi-spectra attention to elucidate molecular structures directly from raw IR and NMR spectra.
Abstract: Molecular structure elucidation from spectroscopic data is a long-standing challenge in Chemistry, traditionally requiring expert interpretation. We introduce NMIRacle, a two-stage generative framework that builds upon recent paradigms in AI-driven spectroscopy with minimal assumptions. In the first stage, NMIRacle trains a generator to reconstruct molecular structures from count-aware fragment representations, capturing both fragment identities and their occurrences. In the second stage, a spectral encoder maps input spectra (IR, $^1$H-NMR, $^{13}$C-NMR) into a latent embedding used to condition the pre-trained generator, which is fine-tuned for direct spectra-to-molecule generation. This formulation bridges fragment-level chemical modeling with spectral evidence, yielding accurate molecular predictions. Empirical results demonstrate that NMIRacle outperforms existing baselines on molecular elucidation, while maintaining robust performance across increasing levels of molecular complexity.
Submission Track: Full Paper
Submission Category: Automated Material Characterization
Submission Number: 24
Loading