Beyond 1-to-1: A Metric for Probing and Editing 1-to-N Knowledge within Large Language Models

Beyond 1-to-1: A Metric for Probing and Editing 1-to-N Knowledge within Large Language Models

ACL ARR 2025 July Submission1404 Authors

29 Jul 2025 (modified: 03 Sept 2025)ACL ARR 2025 July SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Large language models (LLMs) have demonstrated strong capabilities in encoding and applying factual knowledge, much of which follows a one-to-many (1-to-N) structure, where a single query corresponds to multiple valid answers. However, the existing metrics for evaluating 1-to-N knowledge suffer from inherent limitations, such as ignoring valid alternative answers, failing to reflect model confidence, or neglecting probability distributions. To address these limitations, we propose a new metric, named N-Answer Kullback-Leibler Divergence (NKL), which aligns the predicted probability distribution of an LLM with a given gold distribution (e.g. a pre-training corpus). NKL integrates both ranking and probability information, offering a more comprehensive evaluation. We also formalise 1-to-N knowledge evaluation with two criteria—coverage and alignment—under which NKL demonstrates the best overall performance. Experiments on Counterfact and SNOMED CT further validate NKL’s effectiveness in knowledge probing and editing, providing new insights into LLMs’ ability to represent and modify 1-to-N knowledge.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: 1-to-N knowledge, evaluation metric, knowledge probing, knowledge editing

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Data analysis

Languages Studied: English

Submission Number: 1404

Loading