In-context Learning and Gradient Descent Revisited

Anonymous

In-context Learning and Gradient Descent Revisited

Anonymous

16 Dec 2023ACL ARR 2023 December Blind SubmissionReaders: Everyone

TL;DR: reassessing existing evidence for ICL-GD hypothesis

Abstract: In-context learning (ICL) has shown impressive results in few-shot learning tasks, yet its underlying mechanism is still not fully understood. A recent line of work, suggesting that ICL performs gradient descent (GD)-based optimization implicitly, has captured the imagination of researchers. While promising, the formal results mainly focus on simplified linear settings and provide only a preliminary evaluation of realistic scenarios. In this work, we revisit evidence for ICL-GD correspondence. We identify concerning evidence against these claims. We find gaps in evaluation and theory. We identify flaws in metrics and baselines and insufficient theoretical justification. We show that information flows in ICL and GD in very different ways (we dub this discrepancy Layer Causality), and show that a variant of GD aligns with ICL consistently better than vanilla GD. We find that even simple untrained transformers are on par with trained models on ICL-GD correspondence, leading to serious doubts.

Paper Type: long

Research Area: Interpretability and Analysis of Models for NLP

Contribution Types: Model analysis & interpretability, Reproduction study

Languages Studied: english

0 Replies

Loading