Multi-Scale Protein Language Model for Unified Molecular Modeling

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Protein Pre-training, Unified Molecular Modeling
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: We propose msESM(multi-scale ESM) to realize the multi-scale unified molecular modeling by pre-training on multi-scale code-switch protein sequence and describing relationships among residues and atoms with a multi-scale position encoding.
Abstract: Protein language models have shown great potential in protein engineering. However, the current protein language models mainly work in the residue scale, which cannot offer information in the atom scale. The strong power of protein language models could not be fully exploited to benefit the applications that cross protein and small molecules. In this paper, we propose msESM(multi-scale ESM) to realize the multi-scale unified molecular modeling by pre-training on multi-scale code-switch protein sequence and describing relationships among residues and atoms with a multi-scale position encoding. Experimental results show that msESM outperforms previous methods in protein-molecule tasks and is on par with the state-of-the-art in protein-only and molecule-only tasks.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: zip
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7182
Loading