FLIP: Benchmark tasks in fitness landscape inference for proteins

Christian Dallago; Jody Mou; Kadina E Johnston; Bruce Wittmann; Nick Bhattacharya; Samuel Goldman; Ali Madani; Kevin K Yang

FLIP: Benchmark tasks in fitness landscape inference for proteins

Christian Dallago, Jody Mou, Kadina E Johnston, Bruce Wittmann, Nick Bhattacharya, Samuel Goldman, Ali Madani, Kevin K Yang

Published: 11 Oct 2021, Last Modified: 23 May 2023NeurIPS 2021 Datasets and Benchmarks Track (Round 2)Readers: Everyone

Keywords: protein design, protein landscape prediction, protein representation learning

TL;DR: A set of tasks to probe protein representations for protein engineering.

Abstract: Machine learning could enable an unprecedented level of control in protein engineering for therapeutic and industrial applications. Critical to its use in designing proteins with desired properties, machine learning models must capture the protein sequence-function relationship, often termed fitness landscape. Existing benchmarks like CASP or CAFA assess structure and function predictions of proteins, respectively, yet they do not target metrics relevant for protein engineering. In this work, we introduce Fitness Landscape Inference for Proteins (FLIP), a benchmark for function prediction to encourage rapid scoring of representation learning for protein engineering. Our curated splits, baselines, and metrics probe model generalization in settings relevant for protein engineering, e.g. low-resource and extrapolative. Currently, FLIP encompasses experimental data across adeno-associated virus stability for gene therapy, protein domain B1 stability and immunoglobulin binding, and thermostability from multiple protein families. In order to enable ease of use and future expansion to new splits, all data are presented in a standard format. FLIP scripts and data are freely accessible at https://benchmark.protein.properties.

URL: https://benchmark.protein.properties

Supplementary Material: pdf

Contribution Process Agreement: Yes

Dataset Url: https://benchmark.protein.properties

Author Statement: Yes

29 Replies

Loading