Abstract: The recent breakthrough of AlphaFold3 in modeling complex biomolecular interactions, including those between proteins and ligands, nucleotides, or metal ions, creates new opportunities for protein design. In so-called inverse protein folding, the objective is to find a sequence of amino acids that adopts a target protein structure. Many inverse folding methods struggle to predict sequences for complexes that contain non-protein components, and perform poorly with complexes that adopt multiple structural states. To address these challenges, we present ADFLIP (All-atom Discrete FLow matching Inverse Protein folding), a generative model based on discrete flow-matching for designing protein sequences conditioned on all-atom structural contexts. ADFLIP progressively incorporates predicted amino acid side chains as structural context during sequence generation and enables the design of dynamic protein complexes through ensemble sampling across multiple structural states. Furthermore, ADFLIP implements training-free classifier guidance sampling, which allows the incorporation of arbitrary pre-trained models to optimise the designed sequence for desired protein properties. We evaluated the performance of ADFLIP on protein complexes with small-molecule ligands, nucleotides, or metal ions, including dynamic complexes for which structure ensembles were determined by nuclear magnetic resonance (NMR). Our model achieves state-of-the-art performance in single-structure and multi-structure inverse folding tasks, demonstrating excellent potential for all-atom protein design. The code is available at https://github.com/ykiiiiii/ADFLIP .
Lay Summary: Designing proteins with specific functions is a major challenge in biology, especially when they must work together with other molecules like DNA, small drugs, or metal ions. These interactions are often complex and flexible, and many existing tools struggle when protein structures can change shape or involve non-protein components.
We developed ADFLIP, a new method for designing protein sequences based on detailed 3D atomic structures. Unlike earlier approaches, ADFLIP takes into account all atoms in the environment, such as side chains and interacting molecules, and builds the sequence step by step using a technique called discrete flow matching. It can also be guided by other models to optimize for specific goals, like improving stability or binding strength.
We evaluated ADFLIP on challenging protein design tasks, including flexible structures captured by NMR experiments. It outperformed existing methods and offers a powerful new tool for designing realistic and functional proteins.
Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.
Link To Code: https://github.com/ykiiiiii/ADFLIP
Primary Area: Applications->Health / Medicine
Keywords: discrete diffusion, inverse folding, proteins, nucleotides, ions, guidance, nmr, dynamic structure
Submission Number: 6389
Loading