Keywords: In-context Learning, Task Composition, Chain-of-Thought
TL;DR: This work studied empirically and theoretically if the models can compose skills demonstrated in in-context examples to do composite tasks and found their limitations.
Abstract: Composing basic skills from simple tasks to accomplish composite tasks is crucial for modern intelligent systems. We investigate the $\mathit{in}$-$\mathit{context}$ $\mathit{composition}$ ability of language models to perform composite tasks that combine basic skills demonstrated in in-context examples. This is more challenging than the standard setting, where skills and their composition can be learned in training or from contextual information. We conduct systematic experiments on various representative open-source language models, utilizing linguistic and logical tasks designed to probe composition abilities. The results reveal that simple task examples can have a surprising $\mathit{negative}$ $\mathit{impact}$ on the performance, because the models generally struggle to recognize and assemble the skills correctly, even with Chain-of-Thought examples. Theoretical analysis further shows that it is crucial to align examples with the corresponding steps in the composition. This inspires a method for the probing tasks, whose improved performance provides positive support for our insights.
Supplementary Material: zip
Primary Area: transfer learning, meta learning, and lifelong learning
Submission Number: 14676
Loading