Sidechain conditioning and modeling for full-atom protein sequence design with FAMPNN

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY-SA 4.0
TL;DR: We introduce a method for full-atom protein sequence design by simultaneously predicting discrete sequence and continuous 3D side-chain coordinates.
Abstract: Leading deep learning-based methods for fixed-backbone protein sequence design do not model protein sidechain conformation during sequence generation despite the large role the three-dimensional arrangement of sidechain atoms play in protein conformation, stability, and overall protein function. Instead, these models implicitly reason about crucial sidechain interactions based solely on backbone geometry and amino-acid sequence. To address this, we present FAMPNN (Full-Atom MPNN), a sequence design method that explicitly models both sequence identity and sidechain conformation for each residue, where the per-token distribution of a residue’s discrete amino acid identity and its continuous sidechain conformation are learned with a combined categorical cross-entropy and diffusion loss objective. We demonstrate learning these distributions jointly is a highly synergistic task that both improves sequence recovery while achieving state-of-the-art sidechain packing. Furthermore, benefits from explicit full-atom modeling generalize from sequence recovery to practical protein design applications, such as zero-shot prediction of experimental binding and stability measurements.
Lay Summary: Proteins are complex molecules that must fold into precise three-dimensional shapes to function properly, but most current computer methods for designing new proteins only consider the main structural backbone while ignoring the chemical groups that branch off from each building block and actually determine how proteins interact with other molecules. We have developed FAMPNN, a new computational approach that designs both the protein sequence and predicts where every atom will be positioned in space by combining traditional sequence design with an advanced modeling technique that arranges these branching chemical groups. While FAMPNN performs well on standard protein design tests, its real strength becomes apparent when predicting how proteins will actually behave in experiments, such as their stability and ability to bind to other molecules, where it significantly outperforms methods that only use backbone information. This improved performance comes from directly modeling the atomic interactions that control protein function, rather than trying to guess these important details from incomplete structural information. The research demonstrates that accounting for complete molecular architecture, rather than simplified representations, could lead to more successful protein designs for drug discovery, industrial enzyme development, and other biotechnology applications.
Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.
Link To Code: https://github.com/richardshuai/fampnn
Primary Area: Applications->Chemistry, Physics, and Earth Sciences
Keywords: Protein, Molecule, Diffusion, Graph-Neural-Network, Multi-task, MLM, Chemistry, Biology
Submission Number: 5386
Loading