Keywords: Computational Structural Biology, Protein Engineering, Retrieval-Augmented Framework
Abstract: Predicting the effects of protein mutations is crucial for analyzing protein functions and understanding genetic diseases.
However, existing models struggle to effectively extract mutation-related local structure motifs from protein databases, which hinders their predictive accuracy and robustness. To tackle this problem, we design a novel retrieval-augmented framework for incorporating similar structure information in known protein structures. We create a vector database consisting of local structure motif embeddings from a pre-trained protein structure encoder, which allows for efficient retrieval of similar local structure motifs during mutation effect prediction.
Our findings demonstrate that leveraging this method results in the SOTA performance across multiple protein mutation prediction datasets, and offers a scalable solution for studying mutation effects.
Primary Area: Machine learning for healthcare
Submission Number: 18660
Loading