Model Recycling: Model component reuse to promote in-context learning

Published: 10 Oct 2024, Last Modified: 09 Nov 2024SciForDL PosterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We find distinct behavior when recycling learned embeddings vs. transformer weights on the subsequent emergence of ICL.
Abstract: In-context learning (ICL) is a behavior seen in transformer-based models where, during inference, the model is able to leverage examples of a novel task in order to perform accurately on that task. Here we study the role of different model components on ICL behavior via model component recycling. Previous work has found a plateau in the training loss before models begin to learn a general-purpose ICL solution. We explore a model recycling experiment related to ICL, investigating whether recycling model components can reduce the early plateau in the training loss and whether certain components impact ICL more than others. We find that transferring embeddings and early layers of the transformer from a trained model to an untrained model results in the elimination of the plateau seen in standard model training. Moreover, transferring only later layers of the transformer does not result in significant plateau reductions, indicating the importance of the embeddings and early transformer layers in ICL performance.
Style Files: I have used the style files.
Submission Number: 78
Loading