Predicting Kinase-Specific Phosphorylation Sites with Pretrained Protein Language Models

Published: 24 Sept 2025, Last Modified: 26 Dec 2025NeurIPS2025-AI4Science PosterEveryoneRevisionsBibTeXCC BY 4.0
Additional Submission Instructions: For the camera-ready version, please include the author names and affiliations, funding disclosures, and acknowledgements.
Track: Track 1: Original Research/Position/Education/Attention Track
Keywords: protein language model, phosphorylation, kinase, post-translational modification
Abstract: Accurately predicting kinase-specific phosphorylation sites remains difficult due to the diversity of kinases and the context-dependent nature of substrate recognition. Importantly, aberrant kinase overactivation is a hallmark of many cancers including colorectal, gastric, liver, and breast tumors where dysregulated kinase signaling promotes malignant transformation, tumor progression, and therapy resistance. This underscores the clinical importance of understanding kinase-substrate relationships and precisely mapping phosphorylation events. In this paper, we introduce two complementary sequence-based architectures that operate directly on full-length substrate and kinase sequences. Stage 1 extends a task-agnostic prediction method, named Prot2Token, to jointly support three tasks: kinase-group classification from substrate sequences alone, kinase-substrate interaction prediction, and kinase-specific phosphorylation-site prediction while incorporating a self-supervised decoder pretraining task that predicts amino-acid positions from encoder embeddings. This pretraining substantially strengthens site prediction. Stage 2 specializes the architecture for phosphorylation-site prediction by replacing causal decoding of Prot2Token with a bidirectional one, yielding further gains. On standard benchmarks, the specialized model consistently outperforms widely used baselines. Beyond in-distribution evaluation, across both in-distribution and zero-shot settings of understudied dark kinases, we show the sign of zero-shot kinase-specific phosphorylation-site prediction capability. Together, these results indicate that jointly modeling substrate and kinase sequences provides a straightforward, scalable approach to state-of-the-art, zero-shot-capable phosphorylation-site prediction.
Submission Number: 112
Loading