Scaling Atomistic Protein Binder Design with Generative Pretraining and Test-Time Compute

Published: 26 Jan 2026, Last Modified: 08 Mar 2026ICLR 2026 OralEveryoneRevisionsBibTeXCC BY 4.0
Keywords: binder design, protein design, flow matching, hallucination, inference-time scaling, generative modeling, diffusion models
TL;DR: We introduce a novel method for state-of-the-art structure-based protein binder design that combines flow matching-based generative pretraining with inference-time compute scaling techniques.
Abstract: Protein interaction modeling is central to protein design, which has been transformed by machine learning with applications in drug discovery and beyond. In this landscape, structure-based de novo binder design is cast as either conditional generative modeling or sequence optimization via structure predictors (``hallucination''). We argue that this is a false dichotomy and propose Proteina-Complexa, a novel fully atomistic binder generation method unifying both paradigms. We extend recent flow-based latent protein generation architectures and leverage the domain-domain interactions of monomeric computationally predicted protein structures to construct Teddymer, a new large-scale dataset of synthetic binder-target pairs for pretraining. Combined with high-quality experimental multimers, this enables training a strong base model. We then perform inference-time optimization with this generative prior, unifying the strengths of previously distinct generative and hallucination methods. Proteina-Complexa sets a new state of the art in computational binder design benchmarks: it delivers markedly higher in-silico success rates than existing generative approaches, and our novel test-time optimization strategies greatly outperform previous hallucination methods under normalized compute budgets. We also demonstrate interface hydrogen bond optimization, fold class-guided binder generation, and extensions to small molecule targets and enzyme design tasks, again surpassing prior methods. Code, models and new data will be publicly released.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 12999
Loading