Track: Machine learning: computational method and/or computational results
Nature Biotechnology: No
Keywords: AI, Machine Learning, Biology, Protein Engineering, LLMs
TL;DR: Recombinant proteins for reduced immunogenicity using Protein Language Models
Abstract: Protein recombination has long been a key method in protein engineering to diver-
sify and optimize sequences. We enhance and evolve this approach by using a pro-
tein language model, where we found that when attention in the language model is
represented as a spline, abrupt transitions in the spline identify optimal crossover
sites for recombination. As we show, these sites also correlate with transitions
between various secondary structure elements in the corresponding protein struc-
ture. We use these sites to guide recombination of sequence blocks from diverse
sources using MCMC sampling. Language models also enable generation of novel
recombinant blocks beyond traditional MSAs, increasing diversity, while a direct
preference optimization algorithm is used to fine-tune these blocks for reduced
immunogenicity. This method integrates modern deep learning architectures with
traditional protein engineering techniques to improve success rate of the libraries
designed for wetlab verification.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 121
Loading