Track: Tiny paper track (up to 4 pages)
Keywords: reinforcement learning, hallucination, antibody design
TL;DR: We augment gradient-based protein hallucination with reinforcement learning to optimize non-differentiable objectives, achieving a 2.3× improvement in nanobody design acceptance rates.
Abstract: Gradient-based ``hallucination'' methods such as Germinal and BindCraft enable efficient protein binder design by optimizing a continuous relaxation of the sequence (a logit matrix) using gradients from differentiable structure predictors and protein language models.
However, many practically useful objectives for ranking or filtering designs are non-differentiable with respect to sequence (e.g., external confidence metrics from AlphaFold3 or Chai, experimental readouts, or arbitrary black-box scores), preventing direct backpropagation through the objective.
We extend the Germinal pipeline with an \emph{optional} policy-gradient update on the sequence logits, enabling direct optimization of black-box rewards while preserving Germinal's differentiable optimization backbone.
Our implementation reuses Germinal's existing filter metrics as modular reward components and supports both Chai-1 and AlphaFold3 backends for reward evaluation.
On a nanobody design task against PD-L1, adding a small policy-gradient term during the \emph{high-softness} portion of Germinal's optimization yields a 2.3$\times$ improvement in acceptance rate among processed trajectories, from $0.181\pm0.107$ to $0.342\pm0.097$ (mean$\pm$std over six seeds), while maintaining comparable confidence metrics for accepted designs.
This suggests that combining policy-gradient-based black-box optimization helps improve design success rates, potentially improving downstream wet-lab metrics
Submission Number: 70
Loading