PeakNovo: Towards the Robust De Novo Peptide Sequencing for Missing Spectral Peaks

15 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: de novo peptide sequencing
Abstract: \emph{De novo} peptide sequencing aims to infer the corresponding peptide sequence from a mass spectrum, which is a fundamental task in proteomics. However, missing peaks are common in mass spectrometry-based \emph{de novo} peptide sequencing and significantly influence the accuracy of peptide reconstruction. Existing methods mainly rely on the original mass-to-charge ratio and intensity of peaks, but fail to reconstruct a complete peptide sequence when key peaks are missing. To address this issue, we propose PeakNovo, a \emph{de novo} peptide sequencing method that integrates candidate mass spectra search with masked self-distillation. To enhance the robustness of the mass spectrometry (MS) encoder in handling missing peaks, the Masked Self-distillation module feeds a masked local spectrum into the student branch and a complete global spectrum into the teacher branch for alignment. Enabling the encoder to learn consistent representations from local to global views, mitigating the effect of missing peaks. Meanwhile, PeakNovo employs an MS Fusion module to retrieve a set of candidate spectra similar to the input spectrum from the database. These candidate spectra are fused with the representation of the input spectrum to provide supplementary information for potentially missing peaks. Experiments on benchmark datasets show that PeakNovo consistently outperforms existing methods, achieving state-of-the-art performance.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 6291
Loading