Keywords: protein design, generative models, motif scaffolding, geometric preservation, deep learning, benchmark, structural biology
TL;DR: We introduce GeomMotif, a new benchmark that systematically evaluates how well protein generation models preserve the 3D geometry of arbitrary structural fragments, a capability largely untested by current benchmarks that focus on functional motifs.
Abstract: Motif scaffolding in protein design involves generating complete protein structures while preserving the 3D geometry of designated structural fragments, analogous to image outpainting in computer vision. Current benchmarks focus on functional motifs, leaving general geometric preservation capabilities largely untested. We introduce GeomMotif, a systematic benchmark that evaluates arbitrary structural fragment preservation without requiring functional specificity. We construct 57 benchmark tasks, each containing one or two motifs with up to 7 continuous fragments, by sampling from the Protein Data Bank (PDB) to ensure a ground-truth, solvable conformation for every problem. The tasks are characterized by comprehensive structural and physicochemical properties: size, geometric context, secondary structure, hydrophobicity, charge, and degree of burial. These features enable detailed performance analysis beyond simple success rates, revealing model-specific strengths and limitations. We evaluate models using scRMSD and pLDDT for geometric fidelity and clustering for structural diversity and novelty. Our results show that sequence-based and structure-based approaches find different tasks challenging, and that geometric preservation varies significantly with structural and physicochemical context. GeomMotif provides insights complementary to function-focused benchmarks and establishes a foundation for improving protein generative models.
Primary Area: datasets and benchmarks
Submission Number: 17589
Loading