Keywords: protein language models, direct preference optimization, deep mutational scans, experimental validation
TL;DR: We applied ChatGPT-style preference optimization to protein language models using 1M+ experimental measurements, fixing their scaling problems and enabling better prediction of stabilizing mutations that we experimentally validated.
Abstract: Protein language models (pLMs) demonstrate clear scaling laws for structure prediction but exhibit deteriorating performance on mutation effect prediction as model size increases. We present StableESM, applying Direct Preference Optimization (DPO) to ESM2 using over 1 million experimental stability measurements across 1,000 protein domains. Similar to the GPT-3 to ChatGPT transition that aligned language models to be more helpful and aligned with human preferences, we show that preference alignment enables larger model sizes to continue improving on mutation effect prediction while maintaining structure prediction capabilities. StableESM demonstrates improved zero-shot generalization to unseen protein domains and families, and to higher-order mutational effects. In a computational protein design campaign to engineer more stable variants of multicopper oxidase which was unseen during preference alignment, StableESM identified promising designs that were both novel compared to the natural protein and different from what the original model would suggest. Moreover, experimental testing validated StableESM predictions, with designed mutants showing performance equal to or exceeding wild type across normal, thermophilic, and extremophilic temperature conditions. This study shows that preference optimization for protein language models not only improves the base model in mutational effects, but also improves on unseen protein domains and families, and even changes the fitness/stability landscape for a completely unseen/distant protein from the preference alignment dataset.
Submission Number: 79
Loading