Prefix-Tree Decoding for Predicting Mass Spectra from Molecules

Published: 21 Sept 2023, Last Modified: 02 Nov 2023NeurIPS 2023 spotlightEveryoneRevisionsBibTeX
Keywords: molecules, prefix tree, mass spectra, mass spectrum prediction, metabolomics, GNNs, chemistry, biology
TL;DR: Predicting mass spectra from molecules by first predicting molecular formulae (factorized as an autoregressive prefix tree generation task) and secondly estimating intensities at each formula peak (set2set transformer).
Abstract: Computational predictions of mass spectra from molecules have enabled the discovery of clinically relevant metabolites. However, such predictive tools are still limited as they occupy one of two extremes, either operating (a) by fragmenting molecules combinatorially with overly rigid constraints on potential rearrangements and poor time complexity or (b) by decoding lossy and nonphysical discretized spectra vectors. In this work, we use a new intermediate strategy for predicting mass spectra from molecules by treating mass spectra as sets of molecular formulae, which are themselves multisets of atoms. After first encoding an input molecular graph, we decode a set of molecular subformulae, each of which specify a predicted peak in the mass spectrum, the intensities of which are predicted by a second model. Our key insight is to overcome the combinatorial possibilities for molecular subformulae by decoding the formula set using a prefix tree structure, atom-type by atom-type, representing a general method for ordered multiset decoding. We show promising empirical results on mass spectra prediction tasks.
Supplementary Material: pdf
Submission Number: 1939
Loading