Keywords: Local graph structure, contrastive learning, similarity score
Abstract: Accurately computing protein similarity is challenging due to the intricate interplay between local substructures and global structure within protein molecules. Traditional metrics like TM-score often focus on aligning the global structures of the proteins in a rather algorithmic way, potentially overlooking critical local-global relations and contextual comparisons. We introduce EmbedSimScore, a novel self-supervised method that generates superior structural and contextual embeddings by jointly considering both local substructures and global structures of proteins. Utilizing contrastive language-structure pre-training (CLSP) and structural contrastive learning, EmbedSimScore captures comprehensive features across different scales of protein structure. These embeddings provide a more precise and holistic means of computing protein similarities, resulting in the identification of intrinsic relations among proteins that traditional approaches overlook.
Submission Number: 44
Loading