Abstract: In experimental materials science, every measurement counts.
Discovering new materials in the context of compositionally complex materials is time-consuming and costly because of the high number of measurements required to screen the large composition-property space.
To address this, there is a growing need for acceleration strategies that minimize data collection combined with acceptable surrogate model accuracy.
Active learning can significantly reduce the number of labeled data points (measurements) required to train surrogate machine learning models while still achieving high predictive performance with low uncertainty.
However, a major challenge in active learning is the cold-start problem: How to select informative initial points when no labeled data are yet available?
We present and systematically evaluate multiple cold-start initialization strategies for active learning loops based on different existing ``cheap'' data modalities, as well as their multimodal combination.
These strategies provide diverse and representative starting points and lead to rapid model convergence.
Two acquisition functions, \textit{Uncertainty Sampling} (US) and \textit{Self-Adjusting Weighted Expected Improvement} (SAWEI), are compared for iterative point selection, automatically balancing exploration and exploitation.
Active learning is stopped dynamically by monitoring the normalized mean predictive variance of the surrogate model.
We apply our approach to eight experimental composition-spread materials libraries, a common setup for high-throughput screening, with different levels of compositional complexity.
For those materials libraries we learn a surrogate model to predict electrical resistance as a function of composition.
Our active learning framework significantly reduces the number of required measurements, achieving a reduction of 87\% for some materials libraries using a single modality and a reduction of 85\% on average for all materials libraries using a multimodal cold-start strategy.
On average, we find that SAWEI outperforms uncertainty sampling.
In summary, we demonstrate a practical, cold start active learning framework using a multimodal approach that accelerates autonomous experimental characterization on the path to autonomous materials discovery.
Submission Type: Long submission (more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=PYXlemktjM
Changes Since Last Submission: I received a desk rejection, so I did not revise the manuscript content.
Assigned Action Editor: ~Sarath_Chandar1
Submission Number: 8695
Loading