Task Descriptors Help Transformers Learn Linear Models In-Context

Ruomin Huang; Rong Ge

Task Descriptors Help Transformers Learn Linear Models In-Context

Ruomin Huang, Rong Ge

Published: 18 Jun 2024, Last Modified: 10 Jul 2024ICML 2024 Workshop ICL PosterEveryoneRevisionsBibTeXCC BY 4.0

Track: short paper (up to 4 pages)

Keywords: in-context learning, task description, optimization, linear regression

TL;DR: We both theoretically and empirically show that task descriptors help Transformers learn mean-varying linear regressions in-context.

Abstract: Large language models (LLM) exhibit strong in-context learning (ICL) ability, which allows the model to make predictions on new examples based on the given prompt. Recently, a line of research (Von Oswald et al., 2023; Aky ̈urek et al., 2023; Ahn et al., 2023; Mahankali et al., 2023; Zhang et al., 2023) considered ICL for a simple linear regression setting and showed that the forward pass of Transformers is simulating some variants of gradient descent (GD) algorithms on the in-context examples. In practice, the input prompt usually contains two types of information: in-context examples and the task description. Therefore, in this research, we will try to theoretically investigate how the task description helps ICL. Specifically, our input prompt contains not only in-context examples but also a “task descriptor”.We empirically show that the trained transformer can achieve significantly lower loss for ICL when the task descriptor is provided. We further give a global convergence theorem, where the converged parameters match our experimental result.

Submission Number: 35

Loading