Unveiling the Flow of Input-label Mappings for In-context Learning

13 Sept 2024 (modified: 07 Oct 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Language Model; In-Context Learning
Abstract: Large language models (LLMs) excel at processing complex tasks by in-context learning (ICL). However, the internal mechanisms behind ICL remain mysteri- ous, e.g., where does the input-label mappings store inside LLMs, and which modules allow for generalizing these mappings to new question. In this work, we make a substantial step towards reverse-engineering the ICL: (1) Applying a linear model, i.e., Principle Component Analysis (PCA), on the hidden states, we find that LLMs distill the semantic mappings into the principal components (PC) at only a small number of layers. (2) Traditional method to identify the ability-related modules relies heavily on pairs of reference and counterfactual samples, which are designed to activate and not activate the behavior, respectively. However, because of the persistent nature of ICL ability, it is difficult to design counterfactual discrete texts which do not involve ICL. To address this, we introduce PC Patching, which engineers the representation with the identified semantic mappings, rather than text. By subtracting the PC from the original features, the input-label mappings are suppressed, resulting in the counterfactual continuous activations for ICL. The results of PC Patching unveil that there is a small fraction (5%) of attention heads that drive LLMs to process the input-label mappings for the final answer. These insights prompt us to investigate the potential benefits of selectively fine-tuning these essential heads to boost the LLMs’ ICL performance. We empirically find that such precise tuning can yield notable enhancements on unseen ICL tasks. The promising applications on other scenarios, i.e., trustworthiness, further validate the effectiveness of our method. Our work serves as an exploration into the ICL and pave the way to scaling ICL for more intricate tasks.
Primary Area: interpretability and explainable AI
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 237
Loading