Keywords: Protein Design, Protein Structure Prediction, Structure-based Protein Sequence Design, Protein Folding, Protein Inverse Folding
TL;DR: We learn protein structure prediction and structure-based sequence design end-to-end in a discrete, stochastic autoencoder.
Abstract: Deep-learning-based methods have revolutionized the way we address protein structure prediction and structure-based sequence design. Despite their success, current methods still face limitations. Protein structure prediction requires large models, which often act as bottlenecks in protein design workflows. Design methods prioritizes the optimization of sequence recovery as a surrogate for structure recovery. We address these limitations in our model E2EFold by learning both tasks end-to-end in a discrete, stochastic autoencoder. E2EFold is trained to reconstruct an input backbone and predict sidechain conformations. An auxiliary sequence recovery objective guides the encoder to predict a sequence distribution conditioned on the backbone. Discrete sequences are sampled differentiably from this distribution and passed to the decoder for structure prediction. We find that our end-to-end framework enables significantly improved sequence design self-consistency. On designed sequences, our model's structure prediction correlates with Boltz-2’s while relying on more than one order of magnitude fewer parameters. Taken together, these results suggest a promising framework for the advancement of protein structure prediction and sequence design.
Submission Track: Paper Track (Short Paper)
Submission Category: AI-Guided Design + Automated Material Characterization
Institution Location: {Lausanne,Switzerland}
AI4Mat RLSF: Yes
Submission Number: 87
Loading