Don’t throw away that linear head: Few-shot protein fitness prediction with generative modelsDownload PDF

29 Sept 2021 (modified: 13 Feb 2023)ICLR 2022 Conference Withdrawn SubmissionReaders: Everyone
Keywords: language modeling, proteins, fitness prediction
Abstract: Predicting the fitness, i.e. functional value, of a protein sequence is an important and challenging task in biology, particularly due to the scarcity of assay-labeled data. Traditional approaches utilize transfer learning from evolutionary data, yet discard useful information from a generative model’s learned probability distribution. We propose generative fitness fine-tuning, termed gf-tuning, to utilize the generative model’s log probabilities as logits for a pairwise ranking loss---allowing for the full distribution learned in unsupervised training to be repurposed for fine-tuning on assay-labeled fitness data. We demonstrate that gf-tuning achieves better performance than existing baselines across a variety of few-shot fitness prediction settings, including both low homology and highly epistatic systems as well as generalizing from single to multiple mutations. Generative fitness finetuning offers an effective strategy for few-shot fitness prediction which could enable advances to better understand and engineer proteins.
One-sentence Summary: Few-shot protein fitness prediction by fine-tuning generative models as pairwise classifiers
9 Replies

Loading