Universal Biological Sequence Reranking for Improved De Novo Peptide Sequencing

Zijie Qiu; Jiaqi Wei; Xiang Zhang; Sheng Xu; Kai Zou; Zhi Jin; ZhiQiang Gao; Nanqing Dong; Siqi Sun

Universal Biological Sequence Reranking for Improved De Novo Peptide Sequencing

Zijie Qiu, Jiaqi Wei, Xiang Zhang, Sheng Xu, Kai Zou, Zhi Jin, ZhiQiang Gao, Nanqing Dong, Siqi Sun

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: RankNovo is a novel deep reranking framework that improves de novo peptide sequencing by combining strengths of multiple models.

Abstract: De novo peptide sequencing is a critical task in proteomics. However, the performance of current deep learning-based methods is limited by the inherent complexity of mass spectrometry data and the heterogeneous distribution of noise signals, leading to data-specific biases. We present RankNovo, the first deep reranking framework that enhances de novo peptide sequencing by leveraging the complementary strengths of multiple sequencing models. RankNovo employs a list-wise reranking approach, modeling candidate peptides as multiple sequence alignments and utilizing axial attention to extract informative features across candidates. Additionally, we introduce two new metrics, PMD (**P**eptide **M**ass **D**eviation) and RMD (**R**esidual**M**ass **D**eviation), which offer delicate supervision by quantifying mass differences between peptides at both the sequence and residue levels. Extensive experiments demonstrate that RankNovo not only surpasses its base models used to generate training candidates for reranking pre-training, but also sets a new state-of-the-art benchmark. Moreover, RankNovo exhibits strong zero-shot generalization to unseen models—those whose generations were not exposed during training, highlighting its robustness and potential as a universal reranking framework for peptide sequencing. Our work presents a novel reranking strategy that fundamentally challenges existing single-model paradigms and advances the frontier of accurate de novo sequencing. Our source code is provided on GitHub.

Lay Summary: Identifying the exact structure of proteins is crucial for understanding how our bodies work and for developing new drugs, but current AI methods struggle with the noisy data from lab instruments used to analyze proteins. We created RankNovo, a new AI system that combines the strengths of multiple protein analysis models instead of relying on just one. RankNovo acts like a panel of experts that reviews multiple possible protein structures and selects the most accurate one by considering how the candidates relate to each other. Our approach significantly improves protein identification accuracy compared to existing methods. Remarkably, RankNovo can even enhance the performance of protein analysis tools it wasn't specifically trained on, making it a versatile solution for researchers across different labs and experiments. This advancement will help scientists better understand diseases and develop more effective treatments by providing more reliable protein analysis.

Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.

Link To Code: https://github.com/BEAM-Labs/denovo

Primary Area: Applications->Everything Else

Keywords: Peptide Sequencing, De novo, Reranking

Submission Number: 3222

Loading