Data Efficient Training for Materials Property Prediction Using Active Learning Querying

Published: 27 Oct 2023, Last Modified: 03 Dec 2023AI4Mat-2023 PosterEveryoneRevisionsBibTeX
Submission Track: Findings
Submission Category: AI-Guided Design
Keywords: machine learning, active learning, data efficient
TL;DR: This paper is about training machine learning models for materials property prediction with fewer data by active learning querying to select informative samples, which is compared to data subset selection and standard training with the full dataset.
Abstract: The field of machine learning for materials property prediction and characterization is seeing rapid developments in models, datasets, and frameworks. While datasets and models grow in size, frameworks must mature concurrently to match the data requirements and quick development cycles required to support these growing workloads. The efficient training of models is one area where machine learning frameworks may be improved. Utilizing active learning querying strategies to train models from scratch using fewer data can lead to faster development cycles, model evaluations, and reduced costs of training. Well-studied active learning querying strategies from computer vision and natural language processing are directly applied to train an E(n)-GNN model from scratch using a subset of the Materials Project Database and Novel Materials Discovery (NOMAD) Database, with the results compared to data subset selection techniques and the standard training pipeline. In general, the models trained with active learning querying strategies meet or exceed the performance standard trained models while using significantly less training data.
Submission Number: 25
Loading