Transformer Designs for In-Context Learning in Foundation Models for Time Series Forecasting with Covariates
Keywords: foundation models, time series forecasting, covariates, in-context learning, patching, regression task, causal attention
TL;DR: (1) Analysis of models for ICL on regression tasks. (2) A tweaked Shifted Causal Attention enables faster convergence. (3) Patching, popular in time series forecasting, hinders ICL when covariates are present.
Abstract: Recent foundation models (FMs) for time series forecasting (TSF) have shown promising results in zero-shot generalization to new series but are incapable of modeling series-specific dependence on covariates. We identify that historical values in TSF implicitly provide labeled data, which can be leveraged for in-context learning (ICL). While transformers have demonstrated ICL capabilities for regression tasks, their effectiveness as FMs depends on tokenization, attention type, and loss function placement during pre-training. We study three existing tokenization schemes and propose a modified shifted causal attention for faster convergence and effective ICL. This approach combines covariates and the target, enabling linear regression in a single layer. Our theoretical analysis shows that the popular method of patching the input series is suboptimal for ICL on time series with covariates.
Submission Number: 71
Loading