TL;DR: An activation editing method in Protein Language Model to steer the sequence generation toward desired properties and its downstream application to protein optimizaiton.
Abstract: Protein Language Models (PLMs), pre-trained on extensive evolutionary data from natural proteins, have emerged as indispensable tools for protein design. While powerful, PLMs often struggle to produce proteins with precisely specified functionalities or properties due to inherent challenges in controlling their outputs. In this work, we investigate the potential of Activation Steering, a technique originally developed for controlling text generation in Large Language Models (LLMs), to direct PLMs toward generating protein sequences with targeted properties. We propose a simple yet effective method that employs activation editing to steer PLM outputs, and extend this approach to protein optimization through a novel editing site identification module. Through comprehensive experiments on lysozyme-like sequence generation and optimization, we demonstrate that our methods can be seamlessly integrated into both auto-encoding and autoregressive PLMs without requiring additional training. These results highlight a promising direction for precise protein engineering using foundation models.
Lay Summary: Designing new proteins with specific functions, such as improved stability or brightness, is a major goal in biotechnology and medicine. Today, scientists use powerful AI models trained on millions of natural proteins to help with this task. However, these models often struggle to create proteins with exactly the properties we want, usually requiring lots of trial and error.
In our research, we adapted a technique from text-generating AI models, called "activation steering," to guide protein-generating AI models more precisely. Instead of retraining the whole model or relying on special keywords, our method tweaks the model’s internal workings while it is generating new protein sequences. This allows us to nudge the model toward producing proteins with desired features, like higher stability or better solubility, without extra training or large datasets.
We also developed a way to identify which parts of a protein should be changed to achieve these goals. Our experiments show that this approach works across different types of protein AI models and for various important protein properties. This could make it much easier and faster to design proteins for medicine, industry, and research.
Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.
Primary Area: Applications->Chemistry, Physics, and Earth Sciences
Keywords: Protein Language Model, Steering, Protein Engineering
Submission Number: 9199
Loading