Large scale Extraction of Composition and Properties from Materials Tables

NeurIPS 2024 Workshop AI4Mat Submission82 Authors

Published: 03 Nov 2024, Last Modified: 09 Dec 2024AI4Mat-NeurIPS-2024EveryoneRevisionsBibTeXCC BY 4.0
Submission Track: Findings & Open Challenges
Submission Category: AI-Guided Design
Keywords: Information extraction, materials table, table data extraction, database
TL;DR: Here, we present a framework that allows automated large-scale extraction of knowledge base from materials tables
Abstract: In this study, we aim to develop the largest automated knowledge base (KB) of inorganic materials’ compositions and properties by systematically extracting data from published research articles in the Materials Science (MatSci) domain. Since most material compositions and properties are reported in tables, their efficient extraction is essential for building large-scale knowledge repositories in this field. To this extent, we developed a framework combining two models, namely, DISCOMAT and PEGAMAT, for extracting materials’ compositions and properties respectively. Training data was generated through distant supervision using compositions and desired properties from existing databases and the corresponding journals, supplemented by rule-based. Validation and test datasets were manually annotated by materials science experts. DISCOMAT achieved an F1 score of 71.49 for composition extraction, while PEGAMAT attained 86.90 for property extraction. We processed research papers published in 12 journals of the ScienceDirect database for our study and extracted more than 550,000 entries comprising around 100,000 glass material compositions with their properties, along with 137,000 compositions and 316,000 properties without their counterparts. The proposed models and the resulting database offer significant potential to advance the modeling and development of tailored materials.
AI4Mat Journal Track: Yes
Submission Number: 82
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview