Diversity-Aware Pretraining in Materials Learning via Task Similarity

Published: 02 Mar 2026, Last Modified: 02 Mar 2026AI4Mat-ICLR-2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Transfer Learning, Similarity, Diversity, Pretraining, Small Data, Finetuning, Task Similarity, Dataset Creation
TL;DR: Task similarity metrics can be used to create a diversity-aware pretraining dataset to improve transfer learning performance on small data tasks in materials learning.
Abstract: Large-scale datasets underpin recent advances in machine learning; however, in materials science and chemistry, data acquisition remains expensive, making performance in low-data regimes a central challenge. Pretraining and transfer learning are effective strategies in this setting, yet their success critically depends on the choice of pretraining tasks. Poorly selected tasks can yield marginal gains or induce negative transfer, while principled criteria for assembling pretraining datasets remain underexplored. In this work, we leverage task similarity metrics to move beyond selecting a single source task and instead construct diverse, representative pretraining task subsets. Using similarity-derived structure among tasks, we show how pretraining datasets can be assembled to balance relevance and diversity, maximizing knowledge transfer under fixed data budgets. Experiments on the QM9 benchmark demonstrate that models pretrained on such diversity-aware task subsets achieve performance comparable to that of substantially larger pretraining datasets assembled without regard to task relationships. These results identify task diversity as a key factor governing transfer efficiency and provide a practical strategy for scaling general-purpose models in materials science under high labeling costs.
Submission Track: Paper Track (Tiny Paper)
Submission Category: AI-Guided Design
Submission Number: 61
Loading