everyone
since 09 Apr 2025">EveryoneRevisionsBibTeXCC BY 4.0
Chemical foundational models pretrained on expansive materials databases have the potential to significantly accelerate materials discovery relative to traditional quantum-mechanical calculations. However, training and even fine-tuning these models remains expensive and not widely accessible due to the vast amount of data typically required and the complexity of optimization. To address this, we propose a framework for improving the efficiency of the training and fine-tuning of foundational models by prioritizing the most informative training samples and density functional theory (DFT) calculations through Feature Informed Batch Selection - FIBonAQi. Specifically, by using online batch selection strategies, such as Diversified Batch Selection (DivBS) (Hong et al., 2024), originally tested on vision and natural language processing models, FIBonAQi aims to make training and tuning of foundation ML models in chemistry more data efficient relative to conventional uniform sampling strategies. We evaluate the proposed approach both by training from scratch and fine-tuning scenarios. While more extensive testing is needed, preliminary results suggest that online batch selection strategies such as FIBonAQi-DivBS may be able to improve data efficiency in chemical foundation model training.