Keywords: Protein Design, Protein Evolution, Protein Representation Learning, Multimodal Learning, Large Language Models
TL;DR: We benchmark multimodal foundation models on text-guided mutation and text-guided function design.
Abstract: Language models have demonstrated efficacy in protein design by capturing the distribution of amino acid sequences and structures. To advance protein representation learning, biomedical textual description has been integrated as an additional modality, complementing existing sequence and structure information. The textual modality is crucial as it provides insights into detailed molecular functions and cellular contexts in which proteins operate. Existing deep learning methods have built foundation models based on this modality, aiming for challenging protein design tasks, including text-to-protein generation and text-guided protein editing. Meanwhile, the capability of utilizing such multiple modalities to handle natural protein evolution remains an open question. In this work, we introduce two tasks: text-guided point mutation and text-guided Enzyme Commission number switching. These tasks enable a preliminary exploration of the boundaries of utilizing a multimodal foundation model to understand protein evolution process. We assess existing language models on novel protein evolution tasks: text-guided point mutation and EC number switching. Our results show that structure-based models outperform sequence-based ones by 24\% in structure-oriented evolution tasks, despite exhibiting significant biases. We also find that models using free-form text more effectively design enzyme functions, achieving a 30.06\% closer alignment to target functions by integrating evolutionary context.
Submission Number: 120
Loading