BERTPVP: Identifying and Classifying Phage Virion Proteins Using Bidirectional Encoder Representations-Based Transformers
Abstract: Phage virion proteins (PVPs), which form the structural components of phages, are crucial for maintaining phage structures and infecting host bacteria. Identifying PVPs can lead to the development of novel therapeutic agents to combat bacterial infections, attracting great research attention in recent years. However, most of the existing methods for PVP identification heavily depend on the effectiveness of feature extraction and have no ability to precisely classify specific classes. In this article, we propose a bidirectional encoder representations-based Transformer model called BERTPVP for the identification and classification of PVPs. BERTPVP uses a stack of transformer encoders to effectively capture contextual information from the entire protein sequence through the multi-head self-attention mechanism. We firstly pre-train the model using the masked language modeling task to learn contextual information from phage protein sequences, which aids in understanding the significance and relevance of individual components in relation to the entire sequence. Subsequently, the pre-trained model is fine-tuned for PVP identification and classification tasks. Our experimental results show the superiority of the proposed BERTPVP over the state-of-the-art methods in identifying and classifying PVPs. Moreover, the ablation study demonstrates the necessity of the pre-training and fine-tuning components of BERTPVP in accelerating convergence and improving prediction performance.
External IDs:dblp:journals/tcbb/MaZBFXL25
Loading