Keywords: Joint Embedding Learning, Generative Model, Transformer Autoencoder, Inverse Protein Folding, Sequence Design
Abstract: Designing novel protein sequences consistent with a desired 3D structure or fold, often referred to as the inverse protein folding problem, is a central, but non-trivial, task in protein engineering. It has a wide range of applications in energy, biomedicine, and materials science. However, challenges exist due to the complex sequence-fold relationship and difficulties associated with modeling 3D folds. To overcome these challenges, we propose Fold2Seq, a novel transformer-based generative framework for designing protein sequences conditioned on a specific fold. Our model learns a fold embedding from the density of the secondary structural elements in 3D voxels, and then models the complex sequence-structure relationship by learning a joint sequence-fold embedding. Experiments on high-resolution, complete, and single-structure test set demonstrate improved performance of Fold2Seq in terms of speed and reliability for sequence design, compared to existing baselines including the state-of-the-art RosettaDesign and other neural net-based approaches. The unique advantages of fold-based Fold2Seq becomes more evident on diverse real-world test sets comprised of low-resolution, incomplete, or ensemble structures, in comparison to a structure-based model.
One-sentence Summary: A novel transformer-based generative model for learning joint sequence-fold embedding and designing protein sequences shows superior performance and efficiency against existing methods.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/arxiv:2106.13058/code)
Reviewed Version (pdf): https://openreview.net/references/pdf?id=pq7R2sGiO7
5 Replies
Loading