Large Language Models for Molecular Biology: Bridging Computational Advances and Biomolecular Insights
Abstract: Large language models (LLMs) are transforming numerous sectors and are increasingly being explored to advance molecular biology by enabling computational analysis of biological language. However, the grammatical and semantic complexities of biomolecules present challenges for LLMs. This survey explores three key strategies to bridge this gap: (1) biological LLMs, pretrained on biological language to capture unimodal representation or multimodal (i.e., sequence-structure) relationships, (2) post-training adaptations, which refine natural LLMs through instruction-tuning or retrieval-augmented generation, and (3) multimodal LLMs, which is capable of jointly processing biological and natural languages. In this work, we highlight the potential of multimodal LLMs that integrate biomolecular data, general and scientific literature knowledge to enhance biological language processing, thus accelerating molecular biology research while addressing the aforementioned challenges.
Paper Type: Long
Research Area: Language Modeling
Research Area Keywords: Molecular biology, Biological language, Natural language , Large language models, Multimodal Large language models
Contribution Types: Surveys
Languages Studied: Natural language (English), Biological language (protein, DNA, RNA, small molecules)
Submission Number: 7858
Loading