Track: short paper (up to 4 pages)
Keywords: in-context learning, task description, optimization, linear regression
TL;DR: We both theoretically and empirically show that task descriptors help Transformers learn mean-varying linear regressions in-context.
Abstract: Large language models (LLM) exhibit strong in-context learning (ICL) ability, which allows the model to make predictions on new examples based on the given prompt. Recently, a line of research (Von Oswald et al., 2023; Aky ̈urek et al., 2023; Ahn et al., 2023; Mahankali et al., 2023; Zhang et al., 2023) considered ICL for a simple linear regression setting and showed that the forward pass of Transformers is simulating some variants of gradient descent (GD) algorithms on the in-context examples. In practice, the input prompt usually contains two types of information: in-context examples and the task description. Therefore, in this research, we will try to theoretically investigate how the task description helps ICL. Specifically, our input prompt contains not only in-context examples but also a “task descriptor”.We empirically show that the trained transformer can achieve significantly lower loss for ICL when the task descriptor is provided. We further give a global convergence theorem, where the converged parameters match our experimental result.
Submission Number: 35
Loading