Abstract: Accurately predicting essential proteins is vital for comprehending organism survival, aiding in drug discovery, and informing strategies for treating diseases. While previous computational methods for essential protein prediction have predominantly focused on network-based approaches, recent advancements have seen rapid development in sequence-based prediction methods. However, existing sequence-based prediction methods tend to focus only on sequence-level features, ignoring other biological information at diverse levels. To make use of the diverse information across various biological levels, in this study, we introduce ComLMEss, a novel deep learning framework that combines three protein language models. ComLMEss integrates ProtTrans, ESMFold and OntoProtein, which contain different levels of biological information include protein sequence, conservation, structural, and functional information. ComLMEss employs convolutional neural networks and transformer structure to refine and contextualize the representations from three language models, enabling accurate and robust predictions. Experimental results demonstrate that ComLMEss consistently outperforms existing methods. Ablation studies confirm that the effectiveness of combining different language models focus on different biological information. All results underscore the potential of ComLMEss in essential protein prediction. The source code can be obtained at https://github.com/CSUBioGroup/ComLMEss.
External IDs:dblp:conf/bibm/LiuLZZGLZ24
Loading