A Variational Perspective on Generative Protein Fitness Optimization

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: The goal of protein fitness optimization is to discover new protein variants with enhanced fitness for a given use. The vast search space and the sparsely populated fitness landscape, along with the discrete nature of protein sequences, pose significant challenges when trying to determine the gradient towards configurations with higher fitness. We introduce *Variational Latent Generative Protein Optimization* (VLGPO), a variational perspective on fitness optimization. Our method embeds protein sequences in a continuous latent space to enable efficient sampling from the fitness distribution and combines a (learned) flow matching prior over sequence mutations with a fitness predictor to guide optimization towards sequences with high fitness. VLGPO achieves state-of-the-art results on two different protein benchmarks of varying complexity. Moreover, the variational design with explicit prior and likelihood functions offers a flexible plug-and-play framework that can be easily customized to suit various protein design tasks.
Lay Summary: We consider the task of protein fitness optimization, which aims to improve a protein’s functionality by modifying its amino acid sequence to enhance a specific function. Due to the vast search space of possible sequences, computational approaches can aid in suggesting new protein candidates. We use an approach based on generative models, where the goal is to learn the distribution of a data set of protein mutants. We then employ a second model to steer the generation process toward sequences of higher fitness. The effectiveness of our approach is demonstrated on two proteins, AAV and GFP, each with two design tasks of different difficulty (medium and hard).
Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.
Primary Area: Applications->Health / Medicine
Keywords: protein design, variational methods, generative modeling
Submission Number: 10715
Loading