(Out-of-context) Meta-learning in Language Models

11 May 2023 (modified: 12 Dec 2023)Submitted to NeurIPS 2023EveryoneRevisionsBibTeX
Keywords: LLMs, QA, world models, internalization, consistency, meta-learning
TL;DR: Our experiments imply that large language models may better internalize true-seeming statements, or text from authoritative sources, compared to text that looks to be from an unreliable-seeming source.
Abstract: Brown et al. (2020) famously introduced the phenomenon of in-context meta-learning in large language models (LLMs). Our work establishes the existence of a phenomenon we call out-of-context meta-learning via carefully designed synthetic experiments with large language models. We show that out-of-context meta-learning leads LLMs to more readily “internalize” the semantic content of text that is, or appears to be, broadly useful (such as true statements, or text from authoritative sources) and apply it in appropriate contexts. We further demonstrate internalization in a synthetic computer vision setting, and propose two hypotheses for the emergence of internalization: one relying on the way models store knowledge in their parameters, and another suggesting that the implicit gradient alignment bias of gradient-descent-based methods may be responsible. Finally, we reflect on what our results might imply about capabilities of future AI systems, and discuss potential risks.
Supplementary Material: pdf
Submission Number: 14333
Loading