Towards Foundation Models for Scientific Machine Learning: Characterizing Scaling and Transfer Behavior
Keywords: scientific machine learning, scaling, transfer learning, neural operators, foundation models
TL;DR: We study the scaling and transfer learning performance of neural operators on different PDE systems to build a pathway towards foundation models for scientific machine learning.
Abstract: Pre-trained machine learning (ML) models have shown great performance for a
wide range of applications, in particular in natural language processing (NLP)
and computer vision (CV). Here, we study how pre-training could be used for
scientific machine learning (SciML) applications, specifically in the context of
transfer learning. We study the transfer behavior of these models as (i) the pretrained
model size is scaled, (ii) the downstream training dataset size is scaled,
(iii) the physics parameters are systematically pushed out of distribution, and (iv)
how a single model pre-trained on a mixture of different physics problems can
be adapted to various downstream applications. We find that—when fine-tuned
appropriately—transfer learning can help reach desired accuracy levels with orders
of magnitude fewer downstream examples (across different tasks that can even be
out-of-distribution) than training from scratch, with consistent behaviour across a
wide range of downstream examples. We also find that fine-tuning these models
yields more performance gains as model size increases, compared to training from
scratch on new downstream tasks. These results hold for a broad range of PDE
learning tasks. All in all, our results demonstrate the potential of the “pre-train and
fine-tune” paradigm for SciML problems, demonstrating a path towards building
SciML foundation models. Our code is available as open-source.
Submission Number: 10247
Loading