A generalized protein design ML model enables generation of functional de novo proteins

Timothy P Riley; Oleg Matusovsky; Mohammad S. Parsa; Pourya Kalantari; Kooshiar Azimian; Kathy Y Wei

A generalized protein design ML model enables generation of functional de novo proteins

Timothy P Riley, Oleg Matusovsky, Mohammad S. Parsa, Pourya Kalantari, Kooshiar Azimian, Kathy Y Wei

Published: 06 Mar 2025, Last Modified: 26 Apr 2025GEMEveryoneRevisionsBibTeXCC BY 4.0

Track: Biology: datasets and/or experimental results

Nature Biotechnology: Yes

Keywords: De novo protein design, molecule programming, text-to-protein, protein ML model, machine learning

TL;DR: MP4 is a transformer-based AI that generates functional de novo proteins from text prompts with high experimental success rates, paving the way for molecule programming.

Abstract: Despite significant advancements, the creation of functional proteins de novo remains a fundamental challenge. Although deep learning has revolutionized applications such as protein folding, a critical gap persists in integrating design objectives across structure and function. Here, we present MP4, a transformer-based AI model that generates novel sequences from functional text prompts, that enables the design of fully folded, functional proteins from minimal input specifications. Our approach demonstrates the ability to generate entirely novel proteins with high experimental success rates or effectively redesign existing proteins. This transformer-based model highlights the potential of generalist AI to address complex challenges in protein design, offering a versatile alternative to specialized approaches.

Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.

Presenter: ~Kathy_Y_Wei1

Format: Maybe: the presenting author will attend in person, contingent on other factors that still need to be determined (e.g., visa, funding).

Funding: No, the presenting author of this submission does *not* fall under ICLR’s funding aims, or has sufficient alternate funding.

Submission Number: 52

Loading