Extrapolative Protein Design through Triplet-based Preference Learning

Published: 03 Jul 2024, Last Modified: 16 Jul 2024ICML 2024 FM-Wild Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Protein design, Protein Language Models, Preference Learning, Extrapolative Biological Design
TL;DR: Steering protein design in the extrapolation region by learning from both pairs and triplets.
Abstract: Extrapolative protein design is a crucial task for automated drug discovery to design proteins with higher fitness than what has been seen in training (eg. higher stability, tighter binding affinity, etc.). The current state-of-the-art methods assume that one can safely steer protein design in the extrapolation region by learning from pairs alone. We hypothesize that (1) noisy pairs do not accurately approximate gradient to improve fitness (2) it is challenging for the models to learn higher order relationships among designs (triplets, etc) from noisy pairs alone. Motivated by the success of alignment in large language models, we have developed an extrapolative protein design via triplet-based preference learning for both better approximation of gradient and directly modeling ranks of triplets fitness. We evaluated our model's performance in designing AAV and GFP proteins and demonstrated that the proposed framework significantly improves the generative models' effectiveness in extrapolation tasks.
Submission Number: 49
Loading