Biological Sequence with Language Model Prompting: A Survey

ACL ARR 2025 May Submission6139 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Large Language models (LLMs) have emerged as powerful tools for addressing challenges across diverse domains. Notably, recent studies have demonstrated that LLMs can substantially improve the efficiency of biomolecular analysis and synthesis, garnering increasing attentions across both academic research and medical applications. In this paper, we systematically investigate how LLMs, guided by prompt-based methodologies,can be applied to biological sequence analysis, including DNA, RNA, proteins, and tasks related to drug discovery. Specifically, we explore how prompt engineering enables LLMs to tackle domain-specific problems, such as promoter sequence prediction, protein structure modeling, and drug-target binding affinity prediction, often in scenarios with limited labeled data. Furthermore, our discussion highlights the transformative potential of prompting in bioinformatics while addressing key challenges such as data scarcity, multimodal fusion, and computational resource limitations. This paper is intended to serve both as a foundational resource for newcomers and as a springboard for ongoing innovation in this rapidly evolving field of study.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: Survey,Prompting,Bioinformatics,NLP
Contribution Types: Surveys
Languages Studied: English
Submission Number: 6139
Loading