Probing Language Models on Their Knowledge Source

Published: 21 Sept 2024, Last Modified: 06 Oct 2024BlackboxNLP 2024 ARR SubmissionsEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Interpretability and Analysis of Models for NLP, Mechanistic Interpretability, Probing
TL;DR: Language Models can struggle between their learned parametric knowledge (PK) and processed context knowledge (CK). A new framework was developed to investigate this, showing that specific activations influence which type of knowledge is prioritized.
Abstract: Large Language Models (LLMs) often encounter conflicts between their learned, internal (parametric knowledge, PK) and external knowledge provided during inference (contextual knowledge, CK). Understanding how LLMs models prioritize one knowledge source over the other remains a challenge. In this paper, we propose a novel probing framework to explore the mechanisms governing the selection between PK and CK in LLMs. Using controlled prompts designed to contradict the model's PK, we demonstrate that specific model activations are indicative of the knowledge source employed. We evaluate this framework on various LLMs of different sizes and demonstrate that mid-layer activations, particularly those related to relations in the input, are crucial in predicting knowledge source selection, paving the way for more reliable models capable of handling knowledge conflicts effectively.
Comment: Dear Program Chairs, We appreciate the feedback from the ARR reviewers and the Area Chair, and will address all comments and suggestions in the revised version of our paper. We believe that with additional pages, we can further clarify and significantly enhance the quality of our paper. More specifically, we will: - Include the control experiments (already conducted) recommended by Reviewer kdaB and provide more detailed information about the dataset, including a sample example as suggested by reviewer WUU8. - Discuss the limitations of our paper regarding the use of diverse forms of prompts with the intention of eliminating potential biases related to copying (as mentioned by reviewer kdaB) as well as the case where the LM uses neither context nor parametric knowledge to predict the object (noted as ND label in our paper -- Not Defined). - Additionally, we plan to extend and refine the abstract to make it more informative, as advised by the Area Chair. Thank you for your consideration.
Paper Link: https://openreview.net/forum?id=qU4F4mDnXw
Submission Number: 5
Loading