Language models enable zero-shot prediction of the effects of mutations on protein function

Joshua Meier; Roshan Rao; Robert Verkuil; Jason Liu; Tom Sercu; Alexander Rives

Language models enable zero-shot prediction of the effects of mutations on protein function

Joshua Meier, Roshan Rao, Robert Verkuil, Jason Liu, Tom Sercu, Alexander Rives

Published: 09 Nov 2021, Last Modified: 05 May 2023NeurIPS 2021 PosterReaders: Everyone

Keywords: Proteins, language modeling, generative biology, zero-shot learning, unsupervised learning, variant prediction

TL;DR: Using zero-shot inference, language models capture the effect of mutations on protein function, performing at state-of-the-art.

Abstract: Modeling the effect of sequence variation on function is a fundamental problem for understanding and designing proteins. Since evolution encodes information about function into patterns in protein sequences, unsupervised models of variant effects can be learned from sequence data. The approach to date has been to fit a model to a family of related sequences. The conventional setting is limited, since a new model must be trained for each prediction task. We show that using only zero-shot inference, without any supervision from experimental data or additional training, protein language models capture the functional effects of sequence variation, performing at state-of-the-art.

Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.

Supplementary Material: pdf

Code: https://github.com/facebookresearch/esm

22 Replies

Loading