FIDIA: Function-Informed Sequence Design via Inference-Aligned Policy Optimization

Minghan Li; Fengji Li; Yilin Tao; Yue Deng

FIDIA: Function-Informed Sequence Design via Inference-Aligned Policy Optimization

Minghan Li, Fengji Li, Yilin Tao, Yue Deng

Published: 30 Apr 2026, Last Modified: 24 Jun 2026ICML 2026 spotlightEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: FIDIA empowers inverse folding models to satisfy complex biological constraints while aligning training with the Best-of-N inference protocol via a theoretically grounded gradient estimator.

Abstract: Computational protein design typically employs a sequential workflow of structure generation followed by sequence (re)design. While structure generators can be explicitly conditioned on functional objectives, inverse folding models are constrained by their function-agnostic nature and sequence-structure degeneracy. More critically, the associated training objectives do not account for the *Best-of-N* (BoN) inference protocol, resulting in a fundamental training-inference misalignment. Here, we propose FIDIA, a reinforcement learning framework that enables **F**unction-**I**nformed sequence **D**esign via **I**nference-**A**ligned policy optimization. Specifically, FIDIA integrates functional constraints into composite rewards and explicitly optimize the induced policy under BoN toward high-fitness sequence regions. We achieve this via a grounded gradient estimator that directly maximizes the expected maximum reward. FIDIA consistently outperforms both standard and RL-optimized baselines in success rate and precision on a general motif scaffolding benchmark. Further experiments on realworld cases including vaccine and affinity-enhancing enzyme design validate FIDIA’s efficacy in complex therapeutic and biocatalytic contexts.

Lay Summary: Current AI models for structure-based protein sequence design often generate candidates that fit a target 3D structure but lack desired biological functions. Furthermore, their training ignores real-world lab constraints, where only a few top candidates can be affordably tested. We developed FIDIA, a reinforcement learning framework that aligns AI training with this real-world screening process. Using a reward-driven system, we guide the model with biological feedback to design highly optimized candidates. FIDIA significantly improves the success rate of generating functional sequences from structures, paving the way for accelerating real-world therapeutic applications, such as the development of novel vaccines and enzymes.

Link To Code: https://github.com/deng-ai-lab/FIDIA-code

Primary Area: Applications->Chemistry, Physics, and Earth Sciences

Keywords: Protein Inverse Folding; Best-of-N Inference Alignment; Gradient Estimation

Originally Submitted PDF: pdf

Submission Number: 32286

Loading