EmbedSimScore: Advancing Protein Similarity Analysis with Structural and Contextual Embeddings

Published: 13 Oct 2024, Last Modified: 02 Dec 2024NeurIPS 2024 Workshop SSLEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Local graph structure, contrastive learning, similarity score
Abstract: Accurately computing protein similarity is challenging due to the intricate interplay between local substructures and global structure within protein molecules. Traditional metrics like TM-score often focus on aligning the global structures of the proteins in a rather algorithmic way, potentially overlooking critical local-global relations and contextual comparisons. We introduce EmbedSimScore, a novel self-supervised method that generates superior structural and contextual embeddings by jointly considering both local substructures and global structures of proteins. Utilizing contrastive language-structure pre-training (CLSP) and structural contrastive learning, EmbedSimScore captures comprehensive features across different scales of protein structure. These embeddings provide a more precise and holistic means of computing protein similarities, resulting in the identification of intrinsic relations among proteins that traditional approaches overlook.
Submission Number: 44
Loading