Optimizing protein fitness using Gibbs sampling with Graph-based Smoothing

Published: 19 Jun 2023, Last Modified: 28 Jul 20231st SPIGM @ ICML PosterEveryoneRevisionsBibTeX
Keywords: protein design, discrete optimization, mcmc, proteins
TL;DR: State-of-the-art Gibbs With Gradient sampling method with graph Laplacian regularization for protein fitness optimization.
Abstract: The ability to design novel proteins with higher fitness on a given task would be revolutionary for many fields of medicine. However, brute-force search through the combinatorially large space of sequences is infeasible. Prior methods constrain search to a small mutational radius from a reference sequence, but such heuristics drastically limit the design space. Our work seeks to remove the restriction on mutational distance while enabling efficient exploration. We propose Gibbs sampling with Graph-based Smoothing (GGS) which iteratively applies Gibbs with gradients to propose advantageous mutations using} graph-based smoothing to remove noisy gradients that lead to false positives. Our method is state-of-the-art in discovering high-fitness proteins with up to 8 mutations from the training set. We study the GFP and AAV design problems, ablations, and baselines to elucidate the results. Code: https://github.com/kirjner/GGS.
Submission Number: 17
Loading