Keywords: Multimodality, Deep Generative Models, Protein Design
TL;DR: BioM3 is a novel multimodal generative AI framework that uses natural language prompts to design functional proteins, with experimental validation demonstrating its ability to generate artificial proteins that function in vivo and in vitro.
Abstract: The advent of natural language interaction with machines has ushered in new innovations in text-guided generation of images, audio, video, and more. In this arena, we introduce Biological Multi-Modal Model (BioM3), as a novel framework for designing functional proteins via natural language prompts. This framework integrates natural language with protein design through a three-stage process: aligning protein and text representations in a joint embedding space learned using contrastive learning, refinement of the text embeddings, and conditional generation of protein sequences via a discrete autoregressive diffusion model. BioM3 synthesizes protein sequences with detailed descriptions of the protein structure, lineage, and function from text annotations to enable the conditional generation of novel sequences with desired attributes through natural language prompts. We present in silico validation of the model predictions for subcellular localization prediction, reaction classification, remote homology detection, scaffold in-painting, and structural plausibility, and in vivo and in vitro experimental tests of natural language prompt-designed synthetic analogs of Src-homology 3 (SH3) domain proteins that mediate signaling in the Sho1 osmotic stress response pathway in baker's yeast. BioM3 possesses state-of-the-art performance in zero-shot prediction and homology detection tasks, and generates proteins with native-like tertiary folds and wild-type levels of experimentally assayed function.
Submission Number: 109
Loading