Knowledge Editing of Large Language Models in the Wild

Knowledge Editing of Large Language Models in the Wild

ACL ARR 2024 June Submission2098 Authors

15 Jun 2024 (modified: 03 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Large language models (LLMs) face the issue of rapid obsolescence as the information they store can quickly become outdated. In addition, retraining LLMs is expensive. Efficient methods for knowledge editing of LLMs are crucial. Existing datasets for knowledge editing typically assume that new knowledge is injected as a simple sentence that details a single tuple, such as "Ellie Kemper is a citizen of United States of America''. However, we are concerned that these datasets are inadequate for evaluating real-world scenarios. In-the-wild text data from natural settings often contains ambiguous relationships between entities and does not solely detail a single tuple. This difference can lead to a drop in performance for existing methods. In this study, we present a new dataset, MQuAKE-Wild, which features new knowledge presented in a style that resembles naturally occurring text. The new dataset provides a benchmark to evaluate the performance of existing methods in scenarios that are more representative of real-world applications. Our findings indicate that current methods perform poor on such a dataset. To tackle the challenge, we propose an innovative architectural design, MuRef, that leverages natural data to refine the relationships between entities. Comparing with existing methods, our method is superior on wild data.

Paper Type: Long

Research Area: Question Answering

Research Area Keywords: knowledge graphs, knowledge base QA

Contribution Types: NLP engineering experiment, Data resources

Languages Studied: English

Submission Number: 2098

Loading