SMELLM: A Paradigm to Integrating Domain Knowledge into LLMs via the Retrieval Augmented Generation

Anonymous

SMELLM: A Paradigm to Integrating Domain Knowledge into LLMs via the Retrieval Augmented Generation

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone

Abstract: The utilization of large language models (LLMs) offers promising opportunities to expedite scientific discovery. However, deploying LLMs to answer scientific questions within specific interdisciplinary research domains, such as single-molecule electronics, poses various challenges that arise from the uniqueness of domain-specific data, the complexity of domain knowledge, and the uniqueness of domain objectives. To address this gap, we propose a paradigm for integrating domain knowledge from single-molecule electronics into LLMs using the retrieval-augmented generation (RAG) framework, named SMELLM. Evaluation results demonstrate that SMELLM achieves a higher SciBERT score than GPT and ChatGPT, with SMELLM-4.0 notably achieving a SciBERT score of 0.731 and a Faithfulness score of 0.916. The responses generated by SMELLM are firmly grounded in domain-specific facts, indicating significant enhancements in LLM capabilities for domain-specific natural language understanding tasks. Furthermore, SMELLM is adaptable for enhancing and evaluating proficiency in LLM across other scientific domains with low computing resource consumption.

Paper Type: long

Research Area: Efficient/Low-Resource Methods for NLP

Contribution Types: NLP engineering experiment, Approaches to low-resource settings

Languages Studied: Python

0 Replies

Loading