Keywords: In-Context Learning, Demonstration Selection
Abstract: Many studies show that not all demonstrations help in-context learning (ICL), limiting performance. Therefore, in this paper, we analyze why demonstrations become ineffective using gradient flow. By setting the gradient flow to zero, we reveal two cases of ineffectiveness: the model has either already learned the information or it is irrelevant to the query. We also prove that in the multi-layer attention model, effectiveness disparities amplify with depth, directing attention toward effective demonstrations. Building on the above discussion, we propose GradS, which selects demonstrations via gradient-flow signals and explicitly accounts for already assimilated information. We validate our derivation and GradS on four prominent LLMs across five mainstream datasets. The experiment confirms that the disparity in demonstration effectiveness is magnified as the model layer increases, substantiating our derivations. Moreover, GradS achieves a relative improvement of $1.3%$ on average over the strongest baselines, achieving new SOTA results in the demonstration selection.
Paper Type: Long
Research Area: Language Models
Research Area Keywords: few-shot QA, prompting
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 1512
Loading