Preference optimization of protein language models as a multi-objective binder design paradigm

Published: 04 Mar 2024, Last Modified: 29 Apr 2024GEM PosterEveryoneRevisionsBibTeXCC BY 4.0
Track: Machine learning: computational method and/or computational results
Keywords: multi-objective design, protein language models, direct preference optimization, binder design, protein engineering
TL;DR: Direct preference optimization offers a viable paradigm for building multi-objective binder design frameworks based on protein language models.
Abstract: We present a multi-objective binder design paradigm based on instruction fine-tuning and direct preference optimization (DPO) of autoregressive protein language models (pLMs). Multiple design objectives are encoded in the language model through direct optimization on expert curated preference sequence datasets comprising preferred and dispreferred distributions. We show the proposed alignment strategy enables ProtGPT2 to effectively design binders conditioned on specified receptors and a drug developability criterion. Generated binder samples demonstrate median isoelectric point (pI) improvements by 17%-60%.
Submission Number: 78
Loading