Fine-tuning protein Language Models by ranking protein fitness

Published: 27 Oct 2023, Last Modified: 22 Nov 2023GenBio@NeurIPS2023 PosterEveryoneRevisionsBibTeX
Keywords: Protein language model, Fitness prediction, Ranking-based fine tuning
TL;DR: Our proposed framework for few-shot fine-tuning of protein language models shows substantial improvements in the fitness prediction.
Abstract: The self-supervised protein language models (pLMs) have demonstrated significant potential in predicting the impact of mutations on protein function and fitness, which is crucial for protein design. There are approaches to further condition pLM to language or multiple sequence alignment (MSA) to produce a protein of a specific family or function. However, most of those conditioning is too coarse-grained to express the function, and still exhibit a weak correlation to fitness and struggle to generate fit variants. To address this challenge, we propose a fine-tuning framework for pLM to align it to a specific fitness by ranking the mutants. We show that constructing the ranked pairs is crucial in fine-tuning pLMs, where we provide a simple yet effective method to improve fitness prediction across various datasets. Through experiments on ProteinGym, our method shows substantial improvements in the fitness prediction tasks even using less than 200 labeled data. Furthermore, we demonstrate that our approach excels in fitness optimization tasks.
Supplementary Materials: zip
Submission Number: 84
Loading