Fine-tuning Protein Language Models with Deep Mutational Scanning improves Variant Effect Prediction

Aleix Lafita; Ferran Gonzalez; Mahmoud Hossam; Paul Smyth; Jacob Deasy; Ari L. Allyn-Feuer; Daniel D Seaton; Stephen Young

Fine-tuning Protein Language Models with Deep Mutational Scanning improves Variant Effect Prediction

Aleix Lafita, Ferran Gonzalez, Mahmoud Hossam, Paul Smyth, Jacob Deasy, Ari L. Allyn-Feuer, Daniel D Seaton, Stephen Young

Published: 04 Mar 2024, Last Modified: 23 Apr 2024MLGenX 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Protein Language Models, Deep Mutational Scanning, Missense Variants, Pathogenicity prediction

TL;DR: We present a novel fine-tuning approach to improve variant effect prediction from Protein Language Models with Deep Mutational Scanning

Abstract: Protein Language Models (PLMs) have emerged as performant and scalable tools for predicting the functional impact and clinical significance of protein-coding variants, but they still lag experimental accuracy. Here, we present a novel finetuning approach to improve the performance of PLMs with experimental maps of variant effects from Deep Mutational Scanning (DMS) assays using a Normalised Log-odds Ratio (NLR) head. We find consistent improvements in a held-out protein test set, and on independent DMS and clinical variant annotation benchmarks from ProteinGym and ClinVar. These findings demonstrate that DMS is a promising source of sequence diversity and supervised training data for improving the performance of PLMs for variant effect prediction.

Submission Number: 7

Loading