MatKG-2: Unveiling precise material science ontology through autonomous committees of LLMs

Published: 27 Oct 2023, Last Modified: 06 Dec 2023AI4Mat-2023 PosterEveryoneRevisionsBibTeX
Submission Track: Papers
Submission Category: All of the above
Keywords: Natural Language Processing, AI, Large Language Models, Materials Science, Materials Informatics
TL;DR: We describe an autonomous knowledge graph generation pipeline that uses a committee of large language models
Abstract: This paper introduces MatKG-2, a Material Science knowledge graph autonomously generated through a Large Language Model (LLM) driven pipeline. Building on the groundwork of MatKG, MatKG-2 employs a novel 'committee of large language models' approach to extract and classify knowledge triples with an established ontology. Unlike the previous version, which relied on statistical co-occurrence, MatKG-2 offers more nuanced, ontology-based relationships. Using open LLMs such as Llama2 7b and Bloom 1b/7b, the study offers reproducibility and broad community engagement. By using 4-bit and 8-bit quantized versions for fine-tuning and inference, MatKG-2 is also more computationally tractable and therefore compatible with most commercially available GPUs. Our work highlights the potential of MatKG-2 in supporting Material Science data infrastructure and in contributing to the semantic web.
Digital Discovery Special Issue: Yes
Submission Number: 47
Loading