Keywords: in-context learning, meta-skill, compositional generalization
Abstract: This study investigates the task generalization exhibited by Transformer models. We hypothesize that Transformers exhibit generalization to unseen tasks by learning "meta-skills", high-level skills that enable models to develop new skills through composition. To test our hypothesis, we conduct extensive in-context learning (ICL) experiments, viewing ICL of a function class as a skill. Our experiments demonstrate that Transformers have high task generalization abilities, as they can effectively in-context learn unseen function classes. This provides strong evidence for our hypothesis, as such generalization cannot occur without the learning of meta-skills. Furthermore, our results suggest that Transformers learn these meta-skills in a sample-efficient and unsupervised manner. Lastly, we show that the learned meta-skills generalize variadically, meaning they can be applied to compositions of an unseen number of skills. This hints at the possibility that Transformers possess strong weak-to-strong generalization abilities, enabling them to perform a greater number of reasoning or composition steps than they have been explicitly taught.
Submission Number: 22
Loading