Abstract: The utilization of large language models (LLMs) offers promising opportunities to expedite scientific discovery. However, deploying LLMs to answer scientific questions within specific interdisciplinary research domains, such as single-molecule electronics, poses various challenges that arise from the uniqueness of domain-specific data, the complexity of domain knowledge, and the uniqueness of domain objectives. To address this gap, we propose a paradigm for integrating domain knowledge from single-molecule electronics into LLMs using the retrieval-augmented generation (RAG) framework, named SMELLM. Evaluation results demonstrate that SMELLM achieves a higher SciBERT score than GPT and ChatGPT, with SMELLM-4.0 notably achieving a SciBERT score of 0.731 and a Faithfulness score of 0.916. The responses generated by SMELLM are firmly grounded in domain-specific facts, indicating significant enhancements in LLM capabilities for domain-specific natural language understanding tasks. Furthermore, SMELLM is adaptable for enhancing and evaluating proficiency in LLM across other scientific domains with low computing resource consumption.
Paper Type: long
Research Area: Efficient/Low-Resource Methods for NLP
Contribution Types: NLP engineering experiment, Approaches to low-resource settings
Languages Studied: Python
0 Replies
Loading