Few-Shot prediction of the experimental functional measurements for proteins with single point mutations

Michael Bikman; Rachel Kolodny; Margarita Osadchy

Few-Shot prediction of the experimental functional measurements for proteins with single point mutations

Michael Bikman, Rachel Kolodny, Margarita Osadchy

Published: 04 Mar 2024, Last Modified: 29 Apr 2024GEM PosterEveryoneRevisionsBibTeXCC BY 4.0

Track: Machine learning: computational method and/or computational results

Keywords: effects of single point mutations; MAVE; protein foundation models; few-show learning; prediction of experimental values for deep mutational scans;protein functional landscape

TL;DR: We propose a transformer-based model to predict experimental measurements of the effects of single point mutations, differing from previous methods, which predict ordering; our model uses few data points from the new experiment for fine-tuning.

Abstract: We use few-shot learning to train transformer-based models to predict the experimental measurements of the effects of single point mutations. Using sequence-based inputs, we supervise the pre-training of our models to predict ~150,000 normalized measurements derived from various experiments (MAVE data). The pre-training is done on proteins distinct from the one under evaluation. The inputs include embeddings from a protein foundation model, sequence conservation features, and protein stability features. After pre-training, for each experiment, we rely on 2\%-5\% randomly sampled values in the tested protein to estimate the normalization transform and to fine-tune the model for that experiment. We transform the normalized predictions of the fine-tuned model of all the single point mutations using the inverse of the estimated normalization transform, thus predicting the experimental values, and compare their Mean Absolute Error (MAE) to the true experimental values. We compare the accuracy of predictions in different settings, including ablation to study the contributions of the different inputs.

Submission Number: 37

Loading