Soft Injection of Task Embeddings Outperforms Prompt-Based In-Context Learning

Jungwon Park; Wonjong Rhee

Soft Injection of Task Embeddings Outperforms Prompt-Based In-Context Learning

Jungwon Park, Wonjong Rhee

04 Sept 2025 (modified: 17 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Model, In-context Learning, Attention Head Attribution, Task Embedding Injection, Mechanistic Interpretability

Abstract: In-Context Learning (ICL) enables Large Language Models (LLMs) to perform tasks by conditioning on input-output examples in the prompt, without requiring any update in model parameters. While widely adopted, it remains unclear whether prompting with multiple examples is the most effective and efficient way to convey task information. In this work, we propose Soft Injection of task embeddings at attention heads. The task embeddings are constructed only once using few-shot ICL prompts and repeatedly used during inference. Soft injection is performed by softly mixing pre-computed task embeddings with attention head activations using pre-optimized mixing parameters, referred to as soft head-selection parameters. This method not only allows a desired task to be performed without in-prompt demonstrations but also significantly outperforms few-shot ICL while reducing memory usage and compute cost at inference time. An extensive evaluation is performed across 57 tasks and 12 LLMs, spanning four model families of sizes from 4B to 70B. Averaged across 57 tasks, our method outperforms 10-shot ICL by 10.2\%–14.3\% across 12 LLMs. A series of analyses show that our method also serves as an insightful tool for analyzing task-relevant roles of attention heads, revealing that task-relevant head positions identified by our method transfer across similar tasks but not across dissimilar ones--uncovering the task-specific nature of head functionality. Our soft injection method significantly improves task performance and reveals task-specific attention heads, deepening the mechanistic understanding of the roles of attention heads in LLMs.

Supplementary Material: zip

Primary Area: foundation or frontier models, including LLMs

Submission Number: 2081

Loading