Keywords: protein structure prediction, protein language models, parameter-efficient training
Abstract: Predicting the structure of interacting chains is crucial for understanding biological systems and developing new drugs. Large-scale Pre-trained Protein Language models (PLMs), such as ESM-2, have shown an impressive ability to extract biologically meaningful representations for protein contact and structure prediction. In this paper, we show that ESMFold, which has been successful in computing accurate atomic structures for single-chain proteins, can be adapted to predict the heterodimer structures in a lightweight manner.
We propose Linker-tuning, which learns a continuous prompt to connect the two chains in a dimer before running it as a single sequence in ESMFold.
Experiment results show that our method is significantly better than the ESMFold-Linker baseline, with relative improvements of +28.13\% and +54.55\% in DockQ score on the i.i.d heterodimer test set and the out-of-distribution (OOD) test set HeteroTest2, respectively. Notably, on the antibody heavy chain light chain (VH-VL) test set, our method successfully predicts all the heavy chain light chain docking interfaces, with 46/68 medium-quality and 22/68 high-quality predictions, while being $9\times$ faster than AF-Multimer.
Supplementary Material: zip
Submission Number: 2544
Loading