Keywords: protein design, discrete optimization, Gibbs sampling, protein engineering
TL;DR: State-of-the-art Gibbs With Gradient inspired sampling method with graph Laplacian regularization for protein fitness optimization.
Abstract: The ability to design novel proteins with higher fitness on a given task would be revolutionary for many fields of medicine. However, brute-force search through the combinatorially large space of sequences is infeasible. Prior methods constrain search to a small mutational radius from a reference sequence, but such heuristics drastically limit the design space. Our work seeks to remove the restriction on mutational distance while enabling efficient exploration. We propose $\textbf{Bi}$-level $\textbf{G}$ibbs sampling with $\textbf{G}$raph-based $\textbf{S}$moothing (BiGGS) which uses the gradients of a trained fitness predictor to sample many mutations towards higher fitness. Bi-level Gibbs first samples sequence locations then sequence edits. We introduce graph-based smoothing to remove noisy gradients that lead to false positives. Our method is state-of-the-art in discovering high-fitness proteins with up to 8 mutations from the training set.
We study the GFP and AAV design problems, ablations, and baselines to elucidate the results.
Supplementary Material: zip
Submission Number: 3620
Loading