Keywords: Protein Generation, Diffusion, Posterior Sampling, Inverse Problems
TL;DR: We introduce a novel diffusion-based posterior sampling algorithm to generate discrete protein sequences, and outperform the current state of the art in protein design benchmarks.
Abstract: Designing new protein sequences that exhibit desirable functionality carries significant implications for medicine and biotechnology. Traditional methods for protein design have prominently comprised of experimental methods, such as in vitro-screening or animal experiments, which are costly and time-consuming. We propose a generative model based approach to protein sequence generation using guided discrete diffusion. We introduce a novel diffusion-based posterior sampling algorithm which uses a BERT-like transformer model to iteratively denoise discrete protein sequences. This approach demonstrates an efficient way to leverage an oracle that is trained to predict the desired functionality and can guide the protein generation procedure. Our experiments demonstrate that our method outperforms the state of the art, achieving higher functionality scores as well as higher ProtGPT2 likelihood scores.
Submission Number: 72
Loading