Designing active and thermostable enzymes with sequence-only predictive modelsDownload PDF

09 Oct 2022 (modified: 05 May 2023)LMRL 2022 PaperReaders: Everyone
Keywords: protein design, enzyme design, fitness prediction, thermostability, distribution shift, language models
TL;DR: We propose a method for using data-driven fitness models to design thermostable, active enzymes, which does not require a structure, experimental measurements of activity, curation of homologous sequences, or family-specific thermostability data.
Abstract: Data-driven models of fitness can be useful in designing novel proteins with desired properties, but many questions remain regarding how and in what settings they should be used. Here, we ask: How can we use predictive models of protein fitness, whose predictions we might not always trust, to design protein sequences enhanced for multiple fitness functions? We propose a general approach for doing so, and apply it to design novel variants of eight different acylphosphatase and lysozyme wild types, intended to be more thermostable and at least as catalytically active as the wild types. Our method does not require a structure, experimental measurements of activity, curation of homologous sequences, or family-specific thermostability data. Experimental characterizations of our designed sequences, as well as sequences designed by PROSS, a competitive baseline method for improving protein thermostability, are currently underway and forthcoming.
0 Replies

Loading