SMELLM: A Paradigm to Integrating Domain Knowledge into LLMs via the Retrieval Augmented GenerationDownload PDF

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone
Abstract: The utilization of large language models (LLMs) offers promising opportunities to expedite scientific discovery. However, deploying LLMs to answer scientific questions within specific interdisciplinary research domains, such as single-molecule electronics, poses various challenges that arise from the uniqueness of domain-specific data, the complexity of domain knowledge, and the uniqueness of domain objectives. To address this gap, we propose a paradigm for integrating domain knowledge from single-molecule electronics into LLMs using the retrieval-augmented generation (RAG) framework, named SMELLM. Evaluation results demonstrate that SMELLM achieves a higher SciBERT score than GPT and ChatGPT, with SMELLM-4.0 notably achieving a SciBERT score of 0.731 and a Faithfulness score of 0.916. The responses generated by SMELLM are firmly grounded in domain-specific facts, indicating significant enhancements in LLM capabilities for domain-specific natural language understanding tasks. Furthermore, SMELLM is adaptable for enhancing and evaluating proficiency in LLM across other scientific domains with low computing resource consumption.
Paper Type: long
Research Area: Efficient/Low-Resource Methods for NLP
Contribution Types: NLP engineering experiment, Approaches to low-resource settings
Languages Studied: Python
0 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview