On the In-context Generation of Language Models

ACL ARR 2024 June Submission4213 Authors

16 Jun 2024 (modified: 02 Aug 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Large language models (LLMs) are found to have the ability of in-context generation (ICG): when they are fed with an in-context prompt containing a somehow similar examples, they can implicitly discover the pattern of them and then complete the prompt in the same pattern. ICG is curious, since language models are not completely trained in the way same as the in-context prompt, and the distribution of examples in the prompt differs from that of sequences in the pretrained corpora. This paper provides a systematic study of the ICG ability of language models, covering discussions about its source and influential factors, in the view of both theory and empirical experiments. Concretely, we first propose a plausible latent variable model to describe the distribution of the pretrained corpora, and then formalize ICG as a problem of next topic prediction. With this framework, we can prove that the repetition nature of a few topics ensures the ICG ability on them theoretically. Then, we use this controllable pretrained distribution to generate several medium-scale synthetic datasets (token scale: 2.1B-3.9B) and experiment with different settings of Transformer architectures (parameter scale: 4M-234M). Our experimental results further offer insights into how factors of data and model architectures influence ICG.
Paper Type: Long
Research Area: Interpretability and Analysis of Models for NLP
Research Area Keywords: In-context generation, latent variable models
Contribution Types: Model analysis & interpretability, Theory
Languages Studied: English
Submission Number: 4213
Loading