InstructPLM-mu: 1-Hour Fine-Tuning of ESM2 Beats ESM3 in Protein Mutation Predictions

Junde Xu; Yapin shi; Lijun Lang; Taoyong Cui; Zhiming Zhang; Guangyong Chen; Jiezhong Qiu; Pheng-Ann Heng

InstructPLM-mu: 1-Hour Fine-Tuning of ESM2 Beats ESM3 in Protein Mutation Predictions

Junde Xu, Yapin shi, Lijun Lang, Taoyong Cui, Zhiming Zhang, Guangyong Chen, Jiezhong Qiu, Pheng-Ann Heng

03 Sept 2025 (modified: 15 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: protein language model, multi-modal, paramter-efficient fine tuning

TL;DR: This paper propose an multimodal fine-tuning framework called InstructPLM-mu, which fine-tunes ESM2 to the performance level of ESM3 on the zero-shot protein mutation prediction task.

Abstract: Multimodal protein language models deliver strong performance on mutation-effect prediction, but training such models from scratch demands substantial computational resources. In this paper, we propose a fine-tuning framework called InstructPLM-mu and try to answer a question: \textit{Can multimodal fine-tuning of a pretrained, sequence-only protein language model match the performance of models trained end-to-end? } Surprisingly, our experiments show that fine-tuning ESM2 with structural inputs can reach performance comparable to ESM3. To understand how this is achieved, we systematically compare three different feature-fusion designs and fine-tuning recipes. Our results reveal that both the fusion method and the tuning strategy strongly affect final accuracy, indicating that the fine-tuning process is not trivial. We hope this work offers practical guidance for injecting structure into pretrained protein language models and motivates further research on better fusion mechanisms and fine-tuning protocols.

Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)

Submission Number: 1199

Loading