Keywords: Large Language Models, Machine Unlearning
TL;DR: We propose a new Machine Unlearning requirement for LLMs and a method to achieve the requirement.
Abstract: Existing studies have reported that corpora on the Web for training large language models (LLMs) may contain undesirable information such as Personally Identifiable Information, leading to privacy violations when operating LLMs.
To deal with this problem, Machine Unlearning (MU) has attracted attention, aiming to forget arbitrary information from AI models.
However, the existing MU responds to a deletion request for specific data points in the AI model, and it is difficult to respond to a deletion request for a specific concept in the LLM (e.g., a person's name).
This paper proposes a new MU requirement called Concept Unlearning (CU) to make LLMs forget arbitrary concepts from the perspective of a knowledge graph.
This will allow us to define forgetting in terms of "knowledge", which is more intuitive to humans, and allow us to design effective methods for LLM forgetting.
We also propose a method to realize CU by generating appropriate token sequences using LLMs and applying gradient ascent on the generated token sequences.
The effectiveness of our method is confirmed by the dataset created from Wikipedia and LLM-as-a-Judge.
Submission Number: 44
Loading