Distributed Knowledge Storage Hypothesis: Evidence from Generalization Failure in Knowledge Editing

Cheongwoong Kang; Jongeun Baek; Yeonjea Kim; Jaesik Choi

Distributed Knowledge Storage Hypothesis: Evidence from Generalization Failure in Knowledge Editing

Cheongwoong Kang, Jongeun Baek, Yeonjea Kim, Jaesik Choi

17 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Knowledge Editing, Generalization, Large Language Models, Mechanistic Interpretability

Abstract: Large language models (LLMs) store factual knowledge at scale, and recent editing methods allow local updates of target facts. While effective in single prompt styles, prior evaluations have overlooked whether edits generalize across task formats. In this work, we present a systematic study of cross-format generalization in parametric knowledge editing. Across six curated task formats, we show that edits that succeed in the source format often fail to transfer, and high frequency in training does not guarantee a shared, format-invariant knowledge structure. Representation-level analyses reveal that edit directions cluster by task format, supporting a *distributed knowledge subspace hypothesis* in which knowledge is distributed across multiple format-specific subspaces. To mitigate these failures, we introduce multi-format supervision with iterative expansion, which improves transferability with minimal overhead. Our study reframes generalization failure not only as a practical limitation of current editing methods but also as evidence about the distributed memory mechanisms underlying factual knowledge in language models.

Primary Area: interpretability and explainable AI

Submission Number: 8903

Loading