Automated, LLM enabled extraction of synthesis details for reticular materials from scientific literature
Submission Track: LLMs for Materials Science - Full Paper
Submission Category: Automated Material Characterization
Keywords: LLM, knowledge extraction, synthesis details, reticular material, scientific literature
TL;DR: Exploring the use of open-source LLMs to extract knowledge from scientific literature
Abstract: Automated knowledge extraction from scientific literature can potentially accelerate
materials discovery. We have investigated an approach for extracting synthesis
protocols for reticular materials from scientific literature using large language
models (LLMs). To that end, we introduce a Knowledge Extraction Pipeline (KEP)
that automatizes LLM-assisted paragraph classification and information extraction.
By applying prompt engineering with in-context learning (ICL) to a set of open-
source LLMs, we demonstrate that LLMs can retrieve chemical information from
PDF documents, without the need for fine-tuning or training and at a reduced risk
of hallucination. By comparing the performance of five open-source families of
LLMs in both paragraph classification and information extraction tasks, we observe
excellent model performance even if only few example paragraphs are included in
the ICL prompts. The results show the potential of the KEP approach for reducing
human annotations and data curation efforts in automated scientific knowledge
extraction.
AI4Mat Journal Track: Yes
Submission Number: 56
Loading