Keywords: de novo peptide sequencing
Abstract: \emph{De novo} peptide sequencing aims to infer the corresponding peptide sequence from a mass spectrum, which is a fundamental task in proteomics. However, missing peaks are common in mass spectrometry-based \emph{de novo} peptide sequencing and significantly influence the accuracy of peptide reconstruction. Existing methods mainly rely on the original mass-to-charge ratio and intensity of peaks, but fail to reconstruct a complete peptide sequence when key peaks are missing. To address this issue, we propose PeakNovo, a \emph{de novo} peptide sequencing method that integrates candidate mass spectra search with masked self-distillation. To enhance the robustness of the mass spectrometry (MS) encoder in handling missing peaks, the Masked Self-distillation module feeds a masked local spectrum into the student branch and a complete global spectrum into the teacher branch for alignment. Enabling the encoder to learn consistent representations from local to global views, mitigating the effect of missing peaks. Meanwhile, PeakNovo employs an MS Fusion module to retrieve a set of candidate spectra similar to the input spectrum from the database. These candidate spectra are fused with the representation of the input spectrum to provide supplementary information for potentially missing peaks. Experiments on benchmark datasets show that PeakNovo consistently outperforms existing methods, achieving state-of-the-art performance.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 6291
Loading