Just-in-time and distributed task representations in language models

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: task representations, in-context learning, interpretability
TL;DR: We find that language models exhibit persistent sensitivity to task identity in-context, but only activate transferrable task representations sporadically and for semantically-minimal task scopes.
Abstract: Many of language models' impressive capabilities originate from their in-context learning: based on instructions or examples, they can infer and perform new tasks without weight updates. In this work, we investigate *when* representations for new tasks are formed in language models, and *how* these representations change over the course of context. We study two different task representations: those that are ''transferrable''---vector representations that can transfer task contexts to another model instance, even without the full prompt---and simpler representations of high-level task categories. We show that transferrable task representations evolve in non-monotonic and sporadic ways, while task identity representations persist throughout the context. Specifically, transferrable task representations exhibit a two-fold locality. They successfully condense evidence when more examples are provided in the context. But this evidence accrual process exhibits strong *temporal* locality along the sequence dimension, coming online only at certain tokens---despite task identity being reliably decodable throughout the context. In some cases, transferrable task representations also show *semantic* locality, capturing a small task ''scope'' such as an independent subtask. Language models thus represent new tasks on the fly through both an inert, sustained sensitivity to the task and an active, just-in-time representation to support inference.
Primary Area: interpretability and explainable AI
Submission Number: 14578
Loading