Keywords: Biology Foundation Models, Drug Discovery, LLM Reasoning
Abstract: Deep learning in de novo protein design has achieved atomic-level fidelity. However, existing models remain largely non-deliberative: they directly synthesize molecular geometries without explicitly reasoning about which residues or interactions are functionally essential. As a result, design decisions are entangled with continuous sampling dynamics, limiting interpretability, controllability, and systematic reuse of biochemical knowledge.
We introduce Proteo-R1, a reasoning-guided protein design framework that explicitly decouples molecular understanding from geometric generation. Proteo-R1 adopts a dual-expert architecture, in which a multimodal large language model (LLM) serves as an understanding expert and analyzes protein sequences, structures, and textual context to identify key functional residues that govern binding and specificity. These residue-level decisions are then passed to a separate diffusion-based generation expert, which performs conditional co-design while respecting the fixed interaction anchors.
This factorization mirrors how human experts approach molecular engineering: first, reasoning about critical interactions, then optimizing geometry subject to those constraints. By operationalizing reasoning as explicit residue-level commitments rather than latent textual guidance, Proteo-R1 achieves stable, interpretable, and modular integration of LLM reasoning with advanced geometric generative models. Code and demos are at https://smiles724.github.io/r1/.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 5
Loading