Keywords: protein language model, multi-modal, paramter-efficient fine tuning
TL;DR: This paper propose an multimodal fine-tuning framework called InstructPLM-mu, which fine-tunes ESM2 to the performance level of ESM3 on the zero-shot protein mutation prediction task.
Abstract: Multimodal protein language models deliver strong performance on mutation-effect prediction, but training such models from scratch demands substantial computational resources.
In this paper, we propose a fine-tuning framework called InstructPLM-mu and try to answer a question: \textit{Can multimodal fine-tuning of a pretrained, sequence-only protein language model match the performance of models trained end-to-end? }
Surprisingly, our experiments show that fine-tuning ESM2 with structural inputs can reach performance comparable to ESM3.
To understand how this is achieved, we systematically compare three different feature-fusion designs and fine-tuning recipes.
Our results reveal that both the fusion method and the tuning strategy strongly affect final accuracy, indicating that the fine-tuning process is not trivial.
We hope this work offers practical guidance for injecting structure into pretrained protein language models and motivates further research on better fusion mechanisms and fine-tuning protocols.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 1199
Loading