A Foundation Model for Mass Spectrometry Proteomics

A Foundation Model for Mass Spectrometry Proteomics

ICML 2025 Workshop FM4LS Submission25 Authors

Published: 12 Jul 2025, Last Modified: 12 Jul 2025FM4LS 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: proteomics, mass spectrometry, foundation model

TL;DR: We develop a foundation model for mass spectrometry proteomics, and demonstrate the utility of its learned spectrum representations on a variety of downstream tasks.

Abstract: Mass spectrometry is the dominant technology in the field of proteomics, enabling high-throughput analysis of the protein content of complex biological samples. Due to the complexity of the instrumentation and resulting data, sophisticated computational methods are required for the processing and interpretation of acquired mass spectra. Machine learning has shown great promise to improve the analysis of mass spectrometry data, with numerous purpose-built methods for improving specific steps in the data acquisition and analysis pipeline reaching widespread adoption. Here, we propose unifying various spectrum prediction tasks under a single foundation model. To this end, we pre-train a spectrum encoder using de novo sequencing as a pre-training task. We then show that using these pre-trained spectrum embeddings improves our performance on the four downstream tasks of spectrum quality prediction, chimericity prediction, phosphorylation prediction, and glycosylation status prediction, demonstrating that our foundation model has learned generalizable representations of mass spectra.

Submission Number: 25

Loading