Answer When Needed, Forget When Not: Language Models Pretend to Forget via In-Context Knowledge Unlearning

Answer When Needed, Forget When Not: Language Models Pretend to Forget via In-Context Knowledge Unlearning

ACL ARR 2024 June Submission358 Authors

10 Jun 2024 (modified: 02 Aug 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: As large language models (LLMs) are applied across diverse domains, the ability to selectively unlearn specific information has become increasingly essential. For instance, LLMs must be capable of providing confidential information to authorized internal users, such as employees or trusted partners, while withholding it from external users, including the general public or unauthorized entities. In response to this challenge, we propose a novel method termed ''in-context knowledge unleaning'', which enables the model to selectively forget information in real-time based on the context of the query. Our method finetunes pre-trained LLMs to enable prompt unlearning of target knowledge within the context, while preserving the other knowledge. We also propose a F1-based evaluation metric to assess the performance of in-context knowledge unlearning, balancing the trade-off between unlearning target knowledge and retaining the other knowledge. Experiments conducted on the TOFU and AGE datasets with the Llama2-7B/13B and Mistral-7B models demonstrated that our method achieves scores of 70-80 points on the proposed metric, significantly outperforming the baseline method. Further investigation into the model's internal behavior revealed that while finetuned LLMs generate correct predictions in the middle layers and maintain them up to the final layer, they make the decision to forget at the last layer, i.e., ''LLMs pretend to forget''. Our findings offer valuable insights into enhancing the robustness of unlearning mechanisms in LLMs, setting a foundation for future research in the field.

Paper Type: Long

Research Area: NLP Applications

Research Area Keywords: Machine unlearning, In-context unlearning, Right to be forgotten, Approximate data deletion

Languages Studied: English

Submission Number: 358

Loading