Keywords: Explainability, Representation analysis, LLM, prompting, zero-shot
TL;DR: Geometrical analysis of prompt impact on latent representation in LLMs
Abstract: The effectiveness of zero-shot learning frameworks, particularly in Large Language Models (LLMs), has lately shown tremendous improvement. Nonetheless, zero-shot performance critically depends on the prompt quality. Scientific literature has been prolific in proposing methods to select, create, and evaluate prompts from a language or performance perspective, changing their phrasing or creating them following heuristics rules. While these approaches are intuitive, they are insufficient in unveiling the internal mechanisms of Large Language Models. In this work, we propose exploring the impact of prompts on the latent representations of auto-regressive transformer models considering a zero-shot setting. We focus on the geometrical properties of prompts' inner representation at different stages of the model. Experiments conducted give insights into how prompt characteristics influence the structure and distribution of vector representations in generative models. We focus on binary classification tasks on which prompting methods have shown robust performance and show that prompt formulation has indeed an influence on latent representation. However, their impact is dependent on the model family. Using clustering methods, we show that even though prompts are similar in natural language, surprisingly, their representations can differ. This is highly model-dependent, demonstrating the need for more precise analysis.
Primary Area: interpretability and explainable AI
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 11121
Loading