Transformer Designs for In-Context Learning in Foundation Models for Time Series Forecasting with Covariates

Afrin Dange; Vaibhav Raj; Praneeth Netrapalli; Sunita Sarawagi

Transformer Designs for In-Context Learning in Foundation Models for Time Series Forecasting with Covariates

Afrin Dange, Vaibhav Raj, Praneeth Netrapalli, Sunita Sarawagi

Published: 18 Jun 2024, Last Modified: 20 Jul 2024TF2M 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: foundation models, time series forecasting, covariates, in-context learning, patching, regression task, causal attention

TL;DR: (1) Analysis of models for ICL on regression tasks. (2) A tweaked Shifted Causal Attention enables faster convergence. (3) Patching, popular in time series forecasting, hinders ICL when covariates are present.

Abstract: Recent foundation models (FMs) for time series forecasting (TSF) have shown promising results in zero-shot generalization to new series but are incapable of modeling series-specific dependence on covariates. We identify that historical values in TSF implicitly provide labeled data, which can be leveraged for in-context learning (ICL). While transformers have demonstrated ICL capabilities for regression tasks, their effectiveness as FMs depends on tokenization, attention type, and loss function placement during pre-training. We study three existing tokenization schemes and propose a modified shifted causal attention for faster convergence and effective ICL. This approach combines covariates and the target, enabling linear regression in a single layer. Our theoretical analysis shows that the popular method of patching the input series is suboptimal for ICL on time series with covariates.

Submission Number: 71

Loading