Provable in-context learning of linear systems and linear elliptic PDEs with transformers

Published: 11 Oct 2024, Last Modified: 04 Nov 2024Neurips 2024 Workshop FM4Science PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: In-context learning, transformer, elliptic PDE, scientific foundation model
TL;DR: We provide theoretical guarantees for in-context learning the solution operator of a linear elliptic PDE using a pre-trained transformer defined by a single layer of linear attention..
Abstract: Foundation models for natural language processing, empowered by the transformer architecture, exhibit remarkable {\em in-context learning} (ICL) capabilities: pre-trained models can adapt to a downstream task by only conditioning on few-shot prompts without updating the weights of the models. Recently, transformer-based foundation models also emerged as universal tools for solving scientific problems, including especially partial differential equations (PDEs). However, the theoretical underpinnings of ICL-capabilities of these models still remain elusive. This work develops rigorous error analysis for transformer-based ICL of the solution operators associated to a family of linear elliptic PDEs. Specifically, we show that a linear transformer defined by a linear self-attention layer can provably learn in-context to invert linear systems arsing from the spatial discretization of the PDEs. We derive theoretical scaling laws for the proposed linear transformers in terms of the size of the spatial discretization, the number of training tasks, the lengths of prompts used during training and inference, under both the in-domain generalization setting and various settings of distribution shifts. Empirically, we validate the ICL-capabilities of transformers through extensive numerical experiments.
Submission Number: 31
Loading