Investigating Knowledge Unlearning in Large Language Models via Multi-Hop Queries

ACL ARR 2025 February Submission3522 Authors

15 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Large language models (LLMs) serve as giant information stores, often including personal or copyrighted data, and retraining them from scratch is not a viable option for removal. This has led to the development of various fast, approximate unlearning techniques to selectively remove knowledge from LLMs. Prior research has largely focused on minimizing the probabilities of specific token sequences by reversing the language modeling objective. However, these methods may still leave LLMs vulnerable to adversarial attacks that exploit indirect references. In this work, we examine the limitations of current unlearning techniques in effectively erasing a particular type of indirect prompt: multi-hop queries. Our findings reveal that existing methods fail to completely remove multi-hop knowledge when one of the intermediate hops is unlearned. To address this issue, we introduce MemMUL, a simple memory-based approach that stores all forgotten facts externally and filters multi-hop queries based on their respective scores. We demonstrate that MemMUL achieves comparable results with GPT-4o using a 7B model and outperforms previous unlearning methods by a large margin, establishing it as a strong efficient baseline for multi-hop knowledge unlearning.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: security/privacy
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 3522
Loading