Keywords: Protein design, Self-supervised Learning, 3D Geometry, Rigidity, Flow matching, SE(3)-equivariance
TL;DR: We proposed a rigidity-based SSL framework to help protein generation.
Abstract: Protein design stands as one of biology’s most important frontiers, with the potential to transform medicine, advance human health, and drive sustainability. Protein generation, a central task in protein design, has been greatly accelerated by AI-driven models—such as FoldFlow, MultiFlow, and AlphaFlow that build on residue-wise rigidity–based modeling pioneered by AlphaFold2. Residue-wise rigid-body representations reduce structural dimensionality while enforcing chemical constraints, enabling more efficient and physically consistent protein structure generation than all-atom modeling. Despite these advances, existing models often underutilize the vast structural information available in large-scale protein datasets. This highlights the importance of pretraining, which can provide richer representations and improve generalization across diverse protein design tasks. More importantly, the challenge lies in how to fully exploit abundant, low-cost unlabeled protein datasets using unsupervised pretraining. We introduce RigidSSL, a rigidity-based pretraining framework for proteins. RigidSSL canonicalizes structures into an inertial frame, employs a two-phase workflow combining large-scale perturbations and molecular dynamics views, and applies a rigid-body flow matching objective with Invariant Point Attention to capture global geometry. This enables learning stable, geometry-aware representations that improve downstream protein generation. To evaluate the effectiveness of RigidSSL, we conduct quantitative experiments on the protein generation task. Empirically, RigidSSL outperforms previous state-of-the-art geometric pretraining algorithms, leading to improvements in unconditional generation across all metrics, including designability, novelty, and diversity, for length up to 800 residues.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 8465
Loading