Keywords: In-context Learning, Language models, Zero-shot, Representation Learning
Abstract: Large language models (LLMs) have exhibited impressive capability of In-Context
Learning (ICL), where LLMs perform relatively complicated tasks beyond the
pre-training objective by conditioning on the given demonstrations. Nevertheless,
ICL introduces two gaps between pre-training and inference: label appearance
(presence of inserted labels in the demonstrations) and weak semantic relevance
(independently sampled demonstrations exhibit less semantic coherence compared
to consecutive text segments in pretraining corpora). We propose a new inference
method that only use unlabeled inputs from the test set and label space. In this
method, we extract the representations of the demonstrations inputs independently
and fuse them to reshape the representation of the test input for inference. Inter-
estingly, without access to labels, our method outperforms traditional ICL with
extra information of gold labels. Furthermore, our method allows small models
to outperform the zero-shot performance of models that are twice their size (e.g.,
GPT-Neo-2.7B surpasses Llama2-7B, and Llama2-7B outperforms Llama2-13B).
Our code will be available at this.
Primary Area: foundation or frontier models, including LLMs
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 10425
Loading