Where does In-context Learning Happen in Large Language Models?

Suzanna Sia; David Mueller; Kevin Duh

Where does In-context Learning Happen in Large Language Models?

Suzanna Sia, David Mueller, Kevin Duh

Published: 25 Sept 2024, Last Modified: 16 Jan 2025NeurIPS 2024 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Explainability, self-attention layers, redundancy, LLM, in-context learning, machine translation, text to code

TL;DR: We report that LLMs have a 'task recognition' point, where self-attention over the context is no longer necessary.

Abstract: Self-supervised large language models have demonstrated the ability to perform various tasks via in-context learning, but little is known about where the model locates the task with respect to prompt instructions and demonstration examples. In this work, we attempt to characterize the region where large language models transition from recognizing the task to performing the task. Through a series of layer-wise context-masking experiments on GPTNeo2.7B, Bloom3B, Starcoder2-7B, Llama3.1-8B, Llama3.1-8B-Instruct, on Machine Translation and Code generation, we demonstrate evidence of a "task recognition" point where the task is encoded into the input representations and attention to context is no longer necessary. Taking advantage of this redundancy results in 45% computational savings when prompting with 5 examples, and task recognition achieved at layer 14 / 32 using an example with Machine Translation. Our findings also have implications for resource and parameter efficient fine-tuning; we observe a correspondence between strong fine-tuning performance of individual LoRA layers and the task recognition layers.

Primary Area: Interpretability and explainability

Submission Number: 18864

Loading