Fine-tuning Protein Language Models with Deep Mutational Scanning improves Variant Effect Prediction
Keywords: Protein Language Models, Deep Mutational Scanning, Missense Variants, Pathogenicity prediction
TL;DR: We present a novel fine-tuning approach to improve variant effect prediction from Protein Language Models with Deep Mutational Scanning
Abstract: Protein Language Models (PLMs) have emerged as performant and scalable tools
for predicting the functional impact and clinical significance of protein-coding
variants, but they still lag experimental accuracy. Here, we present a novel finetuning
approach to improve the performance of PLMs with experimental maps of
variant effects from Deep Mutational Scanning (DMS) assays using a Normalised
Log-odds Ratio (NLR) head. We find consistent improvements in a held-out protein
test set, and on independent DMS and clinical variant annotation benchmarks
from ProteinGym and ClinVar. These findings demonstrate that DMS is a promising
source of sequence diversity and supervised training data for improving the
performance of PLMs for variant effect prediction.
Submission Number: 7
Loading