Keywords: Generative models, multi-modality, de-novo generation, protein
TL;DR: A multi-modal generative model for protein co-design without tokenizers
Abstract: Proteins are fundamental to biological processes, with their function determined by the complex interplay between the amino acid sequence and the three-dimensional structure. Developing generative models capable of understanding this intrinsically multi-modal relationship is crucial for fields like drug discovery and protein engineering. Existing models often rely on a multi-stage training process where autoencoders that tokenize data into latent representations are trained in a first stage. Secondly, a generative model is trained on the latent representation of the autoencoder(s), i.e., generative modeling in a latent space. We hypothesize that this multi-stage training process is not required to obtain performant co-design models and thus present SimpleDesign, an effective multi-modal protein design model trained directly in the raw data space. SimpleDesign leverages a simple end-to-end training objective with two terms, a discrete cross-entropy for protein sequences and a continuous flow-matching regression objective for protein structures.
In order to better model the sequence and structure modalities, we develop a Mixture-of-Transformer architecture that allows modality-specific processing while keeping global self-attention over both modalities.
We train SimpleDesign on 1.8M sequence-structure pairs achieving strong performance across co-design and unconditional sequence/structure generation benchmarks.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 9626
Loading