Keywords: Language Model, Latent Space, In-Context Learning, Semantics, Disentanglement, Neural Clustering
TL;DR: We propose "vocabulary-defined semantics" to enhance in-context learning via latent space clustering. It mitigates the semantic gap of language models with downstream data, outperforming state-of-the-art in effectiveness, efficiency and robustness.
Abstract: In-context learning enables language models (LM) to adapt to downstream data or tasks by incorporating few samples as demonstrations within the prompts. It offers strong performance without the expense of fine-tuning.
However, due to the context-length restriction, the demonstrations occupy only a small proportion of usable samples. The limitation exacerbates the difficulty of optimization, since the performance of in-context learning can be unstable depending on the quality, format, or order of demonstrations.
Prior work, such as Knn Prompting, index samples based on the similarities of logits at the output-side, in addition to the regular retrieval operation at the input-side.
They improve in-context learning by leveraging the core ability of next-token prediction, rather than relying solely on the emergent capacity to make analogies.
Despite this, the hard-to-optimize issue of in-context learning still exists. In our view, it stems from the process of selecting demonstrations. To address this, we propose complementing in-context learning with an additional clustering operation, making full use of all usable samples.
We propose a novel approach ``vocabulary-defined semantics''.
Grounded in LM vocabulary, which is the label space of model outputs, the proposed approach computes semantically equivalent latent representations for output labels. Then, taking the representations as centroids, a clustering operation is performed to align the semantic properties between the language model and the downstream data/tasks.
Based on extensive experiments across diverse textual understanding datasets and multiple models, our approach outperforms the state-of-the-art in terms of effectiveness and efficiency. On average, it achieves $3\%-49\%$ improvements via the clustering module, while requiring only half of the computation time via the similarity-based logits computation.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 10979
Loading