Tree-based Quantile Active Learning for automated discovery of MOFs

Published: 27 Oct 2023, Last Modified: 11 Dec 2023AI4Mat-2023 PosterEveryoneRevisionsBibTeX
Submission Track: Papers
Submission Category: AI-Guided Design + Automated Chemical Synthesis
Keywords: Quantile Active Learning, Query-based Learning, Automated materials discovery, Metal Organic Frameworks
Supplementary Material: pdf
Abstract: Metal-organic frameworks (MOFs), formed through coordination bonds between metal ions and organic ligands, are promising materials for efficient gas adsorption, due to their ultrahigh porosity, chemical tunability and large surface area. Because over a hundred thousand hypothetical MOFs have been reported to date, brute force discovery of the best performer MOF for a specific application is not feasible. Recently, predicting material properties using machine learning algorithms has played a crucial role in scanning large databases, but this often requires large labeled training sets, which is not always available. To address this, active learning, where the training set is constructed iteratively by querying only informative labels, is necessary. Moreover, in most cases, a very specific range of the property of interest is desirable. We employ a novel regression tree-based quantile active learning algorithm that uses partitions of a regression tree to select new samples to be added to the training set. It thereby limits the sample size while maximizing the prediction quality over a quantile of interest. Tests on benchmark MOF data sets demonstrate that focusing on a specific quantile is effective in learning regression models to predict electronic band gaps and CO$_2$ adsorption in the regions of interest, from a very limited labeled data set.
Submission Number: 89