STELLA: Leveraging Structural Representations to Enhance Protein Understanding with Multimodal LLMs

Hongwang Xiao; Wenjun Lin; Xi Chen; Hui Wang; Kai Chen; Jiashan Li; Yuancheng SUN; Sicheng Dai; Boya Wu; Qiwei Ye

STELLA: Leveraging Structural Representations to Enhance Protein Understanding with Multimodal LLMs

Hongwang Xiao, Wenjun Lin, Xi Chen, Hui Wang, Kai Chen, Jiashan Li, Yuancheng SUN, Sicheng Dai, Boya Wu, Qiwei Ye

28 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Protein Function Prediction, Enzyme-Catalyzed Reaction Prediction, Multimodal Large Language Models, Structural Representations, Protein Biology, Computational Biology

Abstract: Protein biology centers on the intricate relationships among sequence, structure, and function (text), with structure understanding being a crucial aspect for uncovering protein biological functions. Traditional methods based on protein language models (pLMs) often focus on specific aspects of biological function prediction but do not account for the broader, dynamic context of protein research—an important component for addressing the complexity of protein biology. Modern large language models (LLMs) excel in human-machine interaction, language understanding and generation, at a human-like level. By bridging structural representations with the contextual knowledge encoded within LLMs, STELLA leverages the strengths of LLMs to enable versatile and accurate predictions in protein-related tasks. It showcases the transformative potential of multimodal LLMs as a novel paradigm besides pLMs in advancing protein biology research by achieving state-of-the-art performance in both functional description and enzyme-catalyzed reaction prediction tasks. This study not only establishes an innovative LLM-based paradigm to understand proteins, but also expands the boundaries of LLM capabilities in protein biology. To foster collaboration and inspire further innovation, the codes, datasets, and pre-trained models are made publicly available at the anonymous GitHub repository https://anonymous.4open.science/r/STELLA-DF00.

Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 13701

Loading