PROTEIN DESIGNER BASED ON SEQUENCE PROFILE USING ULTRAFAST SHAPE RECOGNITION

20 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Protein sequence design, Sequence profile, Ultrafast shape recognition, Protein language models, Graph neural network
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: SPDesign draws on the protein structure prediction method and applies it to sequence design, and the results are significantly better than the SOTA methods.
Abstract: The process of designing proteins with specified structure and function, which can deepen our understanding of living systems and facilitate the fight against disease, involves a critical component known as sequence design. With the continuous development of deep learning, existing methods have shown excellent performance in protein sequence design. However, most of them focus on optimizing the network architecture to improve performance, while ignoring the explicit biochemical features of proteins. Observing the remarkable success achieved through structural templates and pre-trained knowledge in protein structure prediction, we explored whether similar sequence patterns and representations of underlying structural knowledge can be used in protein sequence design. In this work, we proposed SPDesign, a method for protein sequence design based on sequence profile using ultrafast shape recognition. For an input backbone structure, SPDesign utilizes ultrafast shape recognition vectors to search for similar protein structures (structural analogs) in the PAcluster80 structure library. It then extracts the sequence profile from the analogs through structural alignment. Along with structural pre-trained knowledge and geometric features, they are further condensed to provide reliable sequence patterns for an improved graph neural network. Experimental results show that SPDesign significantly outperforms the state-of-the-art methods on CATH 4.2 benchmark, such as LM-Design and Pifold, leading to 11.4\% and 15.54\% accuracy gains in sequence recovery rate, respectively. Encouraging results have been achieved on the TS50 and TS500 benchmarks, with performance reaching 68.64\% and 71.63\%, respectively. Particularly noteworthy is that our method also achieved significant performance on de novo designed proteins and orphan proteins that are close to practical application scenarios. Finally, the structural modeling verification experiment shows that the sequences designed by our method can fold into the native structures more accurately.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 2622
Loading