Keywords: representation learning, structural bioinformatics, proteins
Abstract: Learning from 3D protein structures has gained a lot of attention in the fields of protein modeling and structural bioinformatics. Unfortunately, the number of available structures is orders of magnitude lower than the number of available protein sequences. Moreover, this number is reduced even more when only annotated protein structures are considered. This makes the training of existing models difficult and prone to overfitting. To address this limitation, we introduce a new representation learning framework for 3D protein structures. Our framework uses unsupervised contrastive learning to learn meaningful representations of protein structures making use of annotated and un-annotated proteins from the Protein Data Bank. We show how these representations can be used to directly solve different tasks in the field of structural bioinformatics, such as protein function and protein structural similarity prediction. Moreover, we show how fine-tuned networks, pre-trained with our algorithm, lead to significantly improved task performance.
14 Replies
Loading