Feature Informed Batch Selection may Accelerate Training and Tuning of Chemical Foundation Models

Published: 03 Mar 2025, Last Modified: 09 Apr 2025AI4MAT-ICLR-2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Submission Track: Findings & Open Challenges (Tiny Paper)
Submission Category: AI-Guided Design
Keywords: chemical foundation models, graph neural networks, online batch selection, gradient-based featurization, active learning, density-functional theory
TL;DR: FIBonAQi is a framework for applying gradient-based online batch selection to improve the training and fine-tuning efficiency of chemical foundation models by targeting the most informative samples.
Abstract:

Chemical foundational models pretrained on expansive materials databases have the potential to significantly accelerate materials discovery relative to traditional quantum-mechanical calculations. However, training and even fine-tuning these models remains expensive and not widely accessible due to the vast amount of data typically required and the complexity of optimization. To address this, we propose a framework for improving the efficiency of the training and fine-tuning of foundational models by prioritizing the most informative training samples and density functional theory (DFT) calculations through Feature Informed Batch Selection - FIBonAQi. Specifically, by using online batch selection strategies, such as Diversified Batch Selection (DivBS) (Hong et al., 2024), originally tested on vision and natural language processing models, FIBonAQi aims to make training and tuning of foundation ML models in chemistry more data efficient relative to conventional uniform sampling strategies. We evaluate the proposed approach both by training from scratch and fine-tuning scenarios. While more extensive testing is needed, preliminary results suggest that online batch selection strategies such as FIBonAQi-DivBS may be able to improve data efficiency in chemical foundation model training.

AI4Mat Journal Track: Yes
Submission Number: 12
Loading