Keywords: Protein function, transformer, pretrain, region proposal network
Abstract: Accurately predicting protein functions remains a significant challenge due to the intricate interplay of sequences, structures, and functions. These relationships, shaped by the principles of physics and evolutionary pressures, highlight the inherent complexity of biological systems. Recent advances in deep learning techniques demonstrate limitations in capturing the functional significance of key residues, as they predominantly rely on posthoc analyses or global structural features, resulting in suboptimal performance. Based on this fact, we introduce the Protein Region Proposal Network (ProteinRPN), the first framework designed for accurate protein function prediction which seamlessly integrates functional residue identification into the prediction pipeline. ProteinRPN features a function region proposal module that identifies potential functional regions (anchors) by leveraging secondary structure definitions and spatial proximity. These anchors are refined through specialized attention mechanisms and further processed via a Graph Multiset Pooling layer. The model is trained on perturbed protein structures using supervised contrastive (SupCon) and InfoNCE losses, enabling it to effectively capture the spatial clustering and functional roles of residues. Notably, it improves the AUPR metric by 15.4% for Biological Process (BP), 8.5% for Cellular Component (CC), and 1.3% for Molecular Function (MF) ontologies, respectively. These results underscore its efficacy in capturing the functional relevance of key residues and advancing protein function prediction.
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 7775
Loading