Towards A Generative Protein Evolution Machine with DPLM-Evo

Published: 02 Mar 2026, Last Modified: 05 Mar 2026GEM 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: evolutionary discrete diffusion model, protein sequence generation, protein sequence evolutionary modeling
Abstract: Proteins are shaped by evolution under biophysical and functional constraints. Protein language models can learn rich evolutionary constraints, and discrete diffusion-based PLMs (DPLMs) are promising for both understanding and generation. However, existing DPLMs rely on masking-based diffusion, which is a loose proxy for evolution, and difficult to model the edit operations that drive sequence change in nature: substitutions and insertions/deletions (indels). In this paper, we present DPLM-Evo, an evolutionay discrete diffusion protein language model that explicitly predicts substitution, insertion, and deletion actions during denoising. To make indel-aware generation tractable, we introduce a latent alignment formulation that supports variable-length sequences. To make substitution corruption informative, we propose a contextual evolutionary noising kernel that generates biologically plausible mutations. On ProteinGym, DPLM-Evo achieves state-of-the-art mutation effect prediction in the single-sequence setting, and it enables variable-length generation and post-editing via explicit edit trajectories.
Presenter: ~Xinyou_Wang1
Format: Yes, the presenting author will attend in person if this work is accepted to the workshop.
Funding: No, the presenting author of this submission does not fall under ICLR’s funding aims, or has sufficient alternate funding.
Submission Number: 89
Loading