Decoding Knowledge in Large Language Models: A Framework for Categorization and Comprehension

Published: 01 Jan 2025, Last Modified: 30 Mar 2026OpenReview Archive Direct UploadEveryonearXiv.org perpetual, non-exclusive license
Abstract: Understanding how large language models (LLMs) acquire, retain, and apply knowledge remains an open challenge. This paper introduces a novel framework, K-(CSA)2, which categorizes LLM knowledge along two dimen-sions: correctness and confidence. The framework defines six categories of knowledge, ranging from highly confident correctness to confidently held misconceptions, enabling a nuanced evaluation of model comprehen-sion beyond binary correctness. Using this framework, we how show CoT prompting and RLHF alter internal (pre-trained) and external (context dependent) knowledge structures. We found CoT improves base model performance and synergizes with aligned LLMs. Layer analysis reveals higher layers encode high-confidence knowledge, while low-confidence knowledge emerges in middle-lower layers.
Loading