ProteinRL: Reinforcement learning with generative protein language models for property-directed sequence design

Published: 27 Oct 2023, Last Modified: 21 Nov 2023GenBio@NeurIPS2023 PosterEveryoneRevisionsBibTeX
Keywords: protein language models, reinforcement learning, optimization
TL;DR: We developed ProteinRL, a policy-based reinforcement learning approach for fine-tuning generative protein language models for property-directed sequence generation and optimization
Abstract: The overarching goal of protein engineering is the design and optimization of proteins customized for specific purposes. Generative protein language models (PLMs) allow for \textit{de novo} protein sequence generation, however current PLMs lack capabilities for controllable sequence generation of sequences tailored with desired properties. Here we present ProteinRL, a flexible, data-driven reinforcement learning framework for fine-tuning generative PLMs for the \textit{de novo} design of sequences optimized for specific sequence and/or structural properties. We highlight two example cases of realistic protein design goals: a single-objective design for sequences containing unusually high charge content, and a multi-objective design scenario of a hit expansion, diversifying a target sequence with generated sequences having high-confidence structure predictions and high probability predictions of soluble expression. In both cases ProteinRL fine-tuning guides the PLM towards generating sequences optimized for the defined properties, extending to values rarely or never seen in natural sequences or sequences generated without ProteinRL fine-tuning. The demonstrated success and adaptability of the ProteinRL framework allows for the \textit{de novo} design of novel protein sequences optimized for applications across many areas of protein engineering.
Submission Number: 13
Loading