Bridging Text and Molecule: A Survey on Language-molecule Models

ACL ARR 2025 February Submission3573 Authors

15 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Artificial intelligence has demonstrated immense potential in scientific research. Within molecular science, it is revolutionizing the traditional computer-aided paradigm, ushering in a new era of deep learning. With recent progress in multimodal learning and natural language processing, an emerging trend has targeted at building multimodal frameworks to jointly model molecules with textual domain knowledge, known as language-molecule models. In this paper, we present the first systematic survey on language-molecule models. Specifically, we begin with the development of molecular deep learning and point out the necessity to involve textual modality. Next, we focus on recent advances in text-molecule alignment methods, categorizing current models based on their architectures and listing relevant pre-training tasks. Furthermore, we delves into the utilization of large language models and prompting techniques for molecular tasks and present significant applications in drug discovery. Finally, we discuss the limitations in this field and highlight several promising directions for future research.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: healthcare applications, cross-modal application
Contribution Types: Surveys
Languages Studied: English
Submission Number: 3573
Loading