DRAK: Unlocking Molecular Insights with Domain-Specific Retrieval-Augmented Knowledge in LLMs

Jinzhe Liu, Xiangsheng Huang, Zhuo Chen, Yin Fang

Published: 01 Jan 2024, Last Modified: 14 May 2025NLPCC (2) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Large Language Models (LLMs) typically manifest knowledge gap in specialized applications due to pre-training on generalized textual corpora. Although fine-tuning and modality alignment aim to bridge this gap, their inability to provide comprehensive knowledge coverage leads to LLMs delivering imprecise responses. To address these challenges, we introduce a scalable and adaptable non-parametric knowledge injection framework, Domain-specific Retrieval-Augmented Knowledge (DRAK), aimed at bolstering LLMs’ knowledge reasoning ability through context examples. DRAK integrates retrieval enhancement and structured knowledge graph recall of high-quality instances, utilizing retrieved examples to unlock LLMs’ context-relevant molecular learning capabilities, offering a universal solution for specific domains. Our validation of DRAK’s effectiveness and generalizability in the biomolecular domain, achieving superior performance across twelve tasks involving both molecule-oriented and bioinformatics texts within the Mol-Instructions dataset. This demonstration of DRAK’s ability to unearth molecular insights establishes a standardized approach for LLMs in navigating the complexities of knowledge-intensive challenges.