DeepNhKcr: Explainable Deep Learning Framework for the Prediction of Crotonylation Sites of Non-Histone Lysine in Plants Based on Pre-Trained Protein Language Model

Jinwei Wang, Zhenjie Luo, Aoyun Geng, Junlin Xu, Yajie Meng, Shankai Yan, Leyi Wei, Zilong Zhang, Qingchen Zhang, Quan Zou, Feifei Cui

Published: 01 Jan 2026, Last Modified: 17 Feb 2026IEEE Transactions on Computational Biology and BioinformaticsEveryoneRevisionsCC BY-SA 4.0
Abstract: Lysine crotonylation (Kcr) is an important protein modification occurring after translation in biology, serving an essential function in a range of biological processes in both plants and animals, including the regulation of gene expression, the maintenance of cellular metabolic balance, and the enhancement of photosynthesis. Exploring the detection of Kcr sites is essential for uncovering their biological functions. Nonetheless, conventional experimental approaches for detection are often time-consuming, expensive, and hindered by various technical constraints, making the precise identification of Kcr sites a significant challenge. This study seeks to develop a computational approach for the rapid and accurate prediction of Kcr sites in plant non-histone proteins. We introduce a novel deep learning framework named DeepNhKcr, which integrates the protein language model (ESM2) with a bidirectional long short-term memory (BiLSTM) network. To address the challenge of data imbalance, the model replaces the conventional cross-entropy loss with the focal loss function. In addition, DeepNhKcr combines advanced deep learning approaches with traditional protein encoding strategies to enable effective feature extraction and integration. This method not only significantly boosts the accuracy of predicting Kcr sites in non-histone proteins of plants. but also provides interpretability, shedding light on the potential links between key sequence characteristics and their biological roles. DeepNhKcr delivers outstanding results, surpassing existing machine learning and deep learning models, and demonstrating excellent performance in both five-fold cross-validation and independent test experiments. Moreover, the model integrates interpretability analysis techniques to investigate the connections between important sequence features and their biological roles. DeepNhKcr acts as a powerful method for detecting Kcr sites in plant non-histone proteins and is anticipated to greatly advance future studies in plant Kcr site prediction.
Loading