Track: Machine learning: computational method and/or computational results
Nature Biotechnology: Yes
Keywords: Protein Function prediction, Benchmarking, Disordered Proteins, Structure based function prediction
TL;DR: We evaluate protein structure based models in predicting function. We evaluate how function prediction varies when using experimental vs predicted structures, proteins with Intrinsically Disordered Regions (IDRs) and when fusing multiple modalities.
Abstract: The ability to make zero-shot predictions about the fitness consequences of
protein sequence changes with pre-trained machine learning models enables
many practical applications. Such models can be applied for downstream
tasks like genetic variant interpretation and protein engineering without
additional labeled data. The advent of capable protein structure prediction
tools has led to the availability of orders of magnitude more precomputed
predicted structures, giving rise to powerful structure-based fitness predic-
tion models. Through our experiments, we assess several modeling choices
for structure-based models and their effects on downstream fitness predic-
tion. We find that training on predicted structures can negatively affect
downstream predictions when using experimental structures, zero-shot fit-
ness prediction models can struggle to learn fitness landscape of proteins
with disordered regions (lacking a fixed 3D structure), and that predicted
structures for disordered regions can be misleading in this setting and affect
predictive performance. Lastly, we evaluate an additional structure-based
model on the ProteinGym substitution benchmark and show that simple
multi-modal ensembles are strong baselines.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 102
Loading