Keywords: Sequence design, Noise label, Diffusion model based backbone design, Motif-scaffolding
TL;DR: Our paper presents a method to improve protein sequence design by introducing per-residue Gaussian noise and noise labels to account for varying backbone quality, significantly enhancing success rates in sequence design and motif scaffolding.
Abstract: Recent advancements in protein design involve generating backbone structures first, followed by sequence design. Among these methods, one of the most popular is ProteinMPNN. A key limitation of ProteinMPNN is its inability to account for varying backbone quality at a per-position level, which is often problematic with structures containing both highly certain and relatively low-certainty regions. To address this, we propose introducing (1) a larger amount of Gaussian noise at a per-residue level and (2) labeling the amount of noise added to each residue as a new feature called a "noise label" to inform the model about backbone uncertainty. This enhancement significantly improves sequence design success rates, as measured by the TM-score between the desired and predicted structures from the sequence. For partially redesigned scaffolds (i.e., motif scaffolding for enzymes or functional proteins), we introduce noise labels to the redesigned scaffolds while maintaining a fixed noise label of 0 for motif residues. This results in higher success rates for motif scaffolding structures, with reduced motif RMSD and overall structure RMSD. Incorporating residue-wise noise labels enhances the design of high-designability sequences for structures generated by various protein generation models.
Submission Number: 90
Loading