Understanding In-context Learning with a Pelican Soup Hypothesis

22 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: general machine learning (i.e., none of the above)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: In-context Learning, chain-of-thought
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: We propose a pelican soup hypotheis that explains in-context learning as the generalization of modeling linguistic phenomena under distribution shifts.
Abstract: Motivated by Pelican Soup riddles, we propose a hypothesis, the Pelican Soup Hypothesis, to explain the in-context learning ability of large language models. We propose a simple but general formalism for natural language classification problems. With this formalism, we show how to understand in-context learning as the generalization of modeling some linguistic phenomena under distribution shifts. We provide evidence supporting this hypothesis. First, we synthesize a dataset called Calcutec that replicates the linguistic phenomena and show that language models trained with this dataset acquire in-context learning ability and benefit from chain-of-thought. Second, our experiment of GPT-2 on some natural language tasks shows the linkage between one of the linguistic phenomena and in-context learning. Third, we use a digit addition task to inspect one of the identified distribution shift type and find that larger models generalize better. Our contributions offer a way to better understand how and why in-context learning works, and our Calcutec and digit addition tasks will facilitate future studies on in-context learning.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 4423
Loading