Confirmation: I have read and agree with the workshop's policy on behalf of myself and my co-authors.
Track: long paper (4–8 pages excluding references)
Keywords: protein design, motif diversification, embedding learning, structural diversity, diffusion models
TL;DR: PGEL perturbs learned embeddings to generate diverse, functional motifs, outperforming partial diffusion.
Abstract: A fundamental challenge in protein design is the trade-off between generating structural diversity while preserving motif biological function. Current state-of-the-art methods, such as partial diffusion in RFdiffusion, often fail to resolve this trade-off: small perturbations yield motifs nearly identical to the native structure, whereas larger perturbations violate the geometric constraints necessary for biological function. We introduce Protein Generation with Embedding Learning (PGEL), a general framework that learns high-dimensional embeddings encoding sequence and structural features of a target motif in the representation space of a diffusion model's frozen denoiser, and then enhances motif diversity by introducing controlled perturbations in the embedding space. PGEL is thus able to loosen geometric constraints while satisfying typical design metrics, leading to more diverse yet viable structures. We demonstrate PGEL on ten representative cases, which include a cancer-related transcription factor complex, an antibody-antigen complex, and an enzyme. PGEL achieves greater structural diversity, better designability, and improved self-consistency, as compared to partial diffusion. Our results establish PGEL as a general strategy for embedding-driven protein generation allowing for systematic, viable diversification of functional motifs.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 42
Loading