TL;DR: LLMs cannot in-context learn alignments between input and output sequences. When fine-tuned with in-context examples, LLMs learn alignment while depending on existing induction circuits for learning the token distribution.
Abstract: Large language models (LLMs) have demonstrated the capability to perform in-context learning (ICL) for completely unseen tasks in classification or language completion. Sequence to sequence (seq2seq) is another popular task category with several applications seeking quick adaptation with ICL. We present a systematic analysis of the ICL capability of LLMs on Seq2Seq tasks using a formal structured language-pair. Our study reveals a critical limitation: except for very short input sequences, ICL fails to achieve consistent learning across all output positions. This exposes a fundamental weakness of modern LLMs — their inability to effectively uncover the alignment between input and output sequences. Consequently, this limitation results in incomplete induction heads, which are essential for in-context learning of new discrete mappings.
To address these limitations, we propose ICA-Tune, a method for focused fine-tuning of an LLM using in-context examples. We present a mechanistic evaluation with two accuracy probes to show how input-output alignment emerges in middle layers of an LLM without direct supervision. This alignment leads to an abrupt jump in the completeness of the induction heads in higher layers. We show that, compared to standard fine-tuning, ICA-Tune enables more sample efficient learning and better generalization to OOD instances.
Lay Summary: Large language models (LLMs) like ChatGPT can learn new tasks just by seeing examples, a skill known as in-context learning. But when the task involves mapping sequences — like translating a sentence or answering a question — the models often fall short, especially when the input is longer. We explored why this happens and discovered that these models don’t naturally figure out how parts of the input relate to parts of the output. This missing link makes it harder for them to learn new sequence-to-sequence tasks just from examples.
To fix this, we designed ICA-Tune, a lightweight method that gently fine-tunes the model using only example prompts. It teaches the model how to align inputs and outputs more clearly, without any extra labels or supervision. ICA-Tune makes language models better at learning new sequence tasks from a few examples, and it helps them generalize to unfamiliar problems — making them more flexible and reliable tools.
Primary Area: Deep Learning->Large Language Models
Keywords: Model adaptation, few-shot learning, sequence to sequence learning, understanding in-context learning in LLMs
Submission Number: 9895
Loading