Abstract: When confronted with a substance of unknown identity,
researchers often perform mass spectrometry on the sample and compare the
observed spectrum to a library of previously collected spectra to identify the
molecule. While popular, this approach will fail to identify molecules that are
not in the existing library. In response, we propose to improve the library’s
coverage by augmenting it with synthetic spectra that are predicted from
candidate molecules using machine learning. We contribute a lightweight
neural network model that quickly predicts mass spectra for small molecules,
averaging 5 ms per molecule with a recall-at-10 accuracy of 91.8%. Achieving
high-accuracy predictions requires a novel neural network architecture that is
designed to capture typical fragmentation patterns from electron ionization.
We analyze the effects of our modeling innovations on library matching
performance and compare our models to prior machine-learning-based work
on spectrum prediction.
0 Replies
Loading