Keywords: multi-modal spectroscopy, molecular structure elucidation, information bottleneck
TL;DR: We propose MSpecTmol, a multi-modal spectrum fusion framework that extends information bottleneck theory, achieving state-of-the-art results in molecular classification and conformation generation.
Abstract: Spectroscopic techniques are indispensable for the elucidation of molecular structures, particularly for novel molecules with unknown configurations. However, a fundamental limitation of any single spectroscopic modality is that it provides an inherently circumscribed and fragmented view, capturing only specific facets of the complete molecular structure, which is often insufficient for unequivocal and robust characterization. Consequently, the integration of data from multiple spectroscopic sources is imperative to overcome these intrinsic limitations and achieve a comprehensive and accurate structural characterization. In this work, we introduce \textbf{MSpecTmol}, a novel \textbf{M}ulti-modal \textbf{Spec}trum information fusion learning framework for \textbf{Mol}ecule structure elucidation. By extending information bottleneck theory, our framework provides a principled and adaptive approach to fusing spectra. It designates a primary modality to extract core molecular features while leveraging auxiliary inputs to enrich the representation. To validate the end-to-end effectiveness of our framework, we design a two-fold evaluation: molecular substructure classification to probe its discriminative power in identifying substructures, and extends this knowledge to reconstruct plausible 3D structures. Our results not only demonstrate state-of-the-art performance in molecular substructure classification but also achieve near-experimental accuracy (\textasciitilde 0.68\AA) in molecular conformation reconstruction. These findings underscore the model’s capacity to learn interpretable features aligned with chemical intuition, thereby paving the way for future advances in automated and reliable spectroscopic analysis. Our code can be found at \href{https://anonymous.4open.science/r/MspecTmol-6B4D}{https://anonymous.4open.science.}
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 6595
Loading