TL;DR: We introduce biology-aware graph augmentations for GCL-based protein representation learning, integrating 2D functional community and 3D structure invariance to enhance protein representation learning.
Abstract: Graph Contrastive Learning (GCL) improves Graph Neural Network (GNN)-based protein representation learning by enhancing its generalization and robustness. Existing GCL approaches for protein representation learning rely on 2D topology, where graph augmentation is solely based on topological features, ignoring the intrinsic biological properties of proteins. Besides, 3D structure-based protein graph augmentation remains unexplored, despite proteins inherently exhibiting 3D structures. To bridge this gap, we propose novel biology-aware graph augmentation strategies from the perspective of invariance and integrate them into the protein GCL framework. Specifically, we introduce Functional Community Invariance (FCI)-based graph augmentation, which employs spectral constraints to preserve topology-driven community structures while incorporating residue-level chemical similarity as edge weights to guide edge sampling and maintain functional communities. Furthermore, we propose 3D Protein Structure Invariance (3-PSI)-based graph augmentation, leveraging dihedral angle perturbations and secondary structure rotations to retain critical 3D structural information of proteins while diversifying graph views.
Extensive experiments on four different protein-related tasks demonstrate the superiority of our proposed GCL protein representation learning framework.
Lay Summary: Proteins are essential building blocks of life, and understanding their structure and function is crucial for many scientific advancements, like developing new medicines. Computers can help us study proteins by learning to "recognize" and "understand" them. Our research introduces new and improved ways for computers to learn about proteins. Currently, many computer methods learn about proteins by looking at simplified, 2D maps of their connections. This often misses important details about what proteins actually do and their complex 3D shapes. We've developed new techniques that teach computers by showing them slightly different versions of the same protein, helping them focus on the most important features. One technique considers the biological roles of different parts of a protein, ensuring that these functional groups are preserved even as the computer sees slightly altered views. Another technique focuses on the protein's 3D shape, making small adjustments to its structure (like wiggling or rotating parts) while keeping its overall form intact. By learning from these more biologically and structurally realistic variations, computers can build a much better understanding of proteins. Our tests show that these new methods significantly improve the computer's ability to perform various tasks related to proteins, paving the way for faster and more accurate discoveries in biology and medicine.
Primary Area: Applications->Everything Else
Keywords: Protein Representation Learning, Graph Contrastive Learning, Graph Augmentation, 3D geometry modeling, Structure Invariance
Submission Number: 4505
Loading