Likelihood-based fine-tuning of protein language models for few-shot fitness prediction and design

Published: 17 Jun 2024, Last Modified: 16 Jul 2024ML4LMS PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: protein design, fine-tuning, protein language models, bayesian optimisation
Abstract: Although various schemes have been proposed for exploiting the distributional knowledge captured by protein language models (PLMs) to enhance supervised fitness prediction and design, lack of head-to-head comparison across different prediction strategies and different classes of PLM has made it challenging to identify the best-performing methods, and to understand the factors contributing to performance. Here, we extend previously proposed ranking-based loss functions to adapt the likelihoods of family-based and masked protein language models, and demonstrate that the best configurations outperform state-of-the-art approaches based on frozen embeddings in the low-data setting. Furthermore, we propose ensembling strategies that exploit the strong dependence of the mutational distributions learned by PLMs on sequence context, showing that they can be used to guide efficient optimisation strategies over fitness landscapes.
Submission Number: 118
Loading