Keywords: meta-learning, general-purpose, in-context, transformers, learning-to-learn, meta-optimization, large-models, black-box
TL;DR: Transformers and other black-box models can exhibit in-context learning-to-learn that generalizes to significantly different datasets while undergoing multiple phase transitions in terms of their learning behavior.
Abstract: Modern machine learning requires system designers to specify aspects of the learning pipeline, such as losses, architectures, and optimizers. Meta-learning, or learning-to-learn, instead aims to learn those aspects, and promises to unlock greater capabilities with less manual effort. One particularly ambitious goal of meta-learning is to train general-purpose learning algorithms from scratch, using only black box models with minimal inductive bias. Such a model takes in training data, and produces test-set predictions, without any explicit definition of an inference model, training loss, or optimization algorithm. In this paper we show that Transformers and other black-box models can be meta-trained to act as general-purpose in-context learners. We characterize phase transitions between algorithms that generalize, algorithms that memorize, and algorithms that fail to meta-train at all, induced by changes in model size, number of tasks, and meta-optimization. We further show that the capabilities of meta-trained algorithms are bottlenecked by the accessible state size (memory) determining the next prediction, unlike standard models which are thought to be bottlenecked by parameter count.