Protein Language Model Predicts Mutation Pathogenicity and Clinical PrognosisDownload PDF

09 Oct 2022 (modified: 05 May 2023)LMRL 2022 PaperReaders: Everyone
Keywords: Protein language models, zero-shot learning, cancer genomics, survival analysis, mutation effect prediction
TL;DR: Protein language model predicts disease-causing mutations and patient survival in cancer.
Abstract: Accurately predicting the effects of mutations in cancer has the potential to improve existing treatments and identify novel therapeutic targets. In this paper, we evidence for the first time that the large-scale pre-trained protein language models (PPLMs) are zero-shot predictors for two clinically relevant tasks: identifying disease-causing mutations and predicting patient survival rate. Then we benchmark a series of state-of-the-art (SOTA) PPLMs on 2279 protein variants across 20 cancer-related genes. Our empirical results show that the PPLMs outperform the SOTA baseline, EVE, trained on multiple sequence alignment (MSA) data. We also demonstrate that the evolutionary index score, generated from the PPLM’s softmax layer, is good indicator for both mutation pathogenicity and patient survival rate. Our paper has taken a key step toward the clinical utility of large-scale PPLMs.
0 Replies

Loading