Keywords: In-context Learning, Task Composition, Chain-of-Thought
TL;DR: This work studied empirically and theoretically if the models can compose skills demonstrated in in-context examples to do composite tasks and found their limitations.
Abstract: Composing basic skills from simple tasks to accomplish composite tasks is crucial for modern intelligent systems. We investigate the $\mathit{in}$-$\mathit{context}$ $\mathit{composition}$ ability of language models to perform composite tasks that combine basic skills demonstrated in in-context examples. This is more challenging than the standard setting, where skills and their composition can be learned in training. We conduct systematic experiments on various representative open-source language models, utilizing linguistic and logical tasks designed to probe composition abilities. The results reveal that simple task examples can have a surprising $\mathit{negative}$ $\mathit{impact}$ on the performance, because the models generally struggle to recognize and assemble the skills correctly, even with Chain-of-Thought examples. Theoretical analysis further shows that it is crucial to align examples with the corresponding steps in the composition. This inspires a method for the probing tasks, whose improved performance provides positive support for our insights.
Supplementary Material: zip
Primary Area: transfer learning, meta learning, and lifelong learning
Submission Number: 14676
Loading