Exploring zero-shot structure-based protein fitness prediction

Published: 06 Mar 2025, Last Modified: 02 May 2025GEMEveryoneRevisionsBibTeXCC BY 4.0
Track: Machine learning: computational method and/or computational results
Nature Biotechnology: Yes
Keywords: Protein Function prediction, Benchmarking, Disordered Proteins, Structure based function prediction
TL;DR: We evaluate protein structure based models in predicting function. We evaluate how function prediction varies when using experimental vs predicted structures, proteins with Intrinsically Disordered Regions (IDRs) and when fusing multiple modalities.
Abstract: The ability to make zero-shot predictions about the fitness consequences of protein sequence changes with pre-trained machine learning models enables many practical applications. Such models can be applied for downstream tasks like genetic variant interpretation and protein engineering without additional labeled data. The advent of capable protein structure prediction tools has led to the availability of orders of magnitude more precomputed predicted structures, giving rise to powerful structure-based fitness predic- tion models. Through our experiments, we assess several modeling choices for structure-based models and their effects on downstream fitness predic- tion. We find that training on predicted structures can negatively affect downstream predictions when using experimental structures, zero-shot fit- ness prediction models can struggle to learn fitness landscape of proteins with disordered regions (lacking a fixed 3D structure), and that predicted structures for disordered regions can be misleading in this setting and affect predictive performance. Lastly, we evaluate an additional structure-based model on the ProteinGym substitution benchmark and show that simple multi-modal ensembles are strong baselines.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 102
Loading