Concept Unlearning for Large Language Models

Tomoya Yamashita; Takayuki Miura; Yuuki Yamanaka; Toshiki Shibahara; Masanori Yamada

Concept Unlearning for Large Language Models

Tomoya Yamashita, Takayuki Miura, Yuuki Yamanaka, Toshiki Shibahara, Masanori Yamada

Published: 12 Oct 2024, Last Modified: 14 Nov 2024SafeGenAi PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, Machine Unlearning

TL;DR: We propose a new Machine Unlearning requirement for LLMs and a method to achieve the requirement.

Abstract: Existing studies have reported that corpora on the Web for training large language models (LLMs) may contain undesirable information such as Personally Identifiable Information, leading to privacy violations when operating LLMs. To deal with this problem, Machine Unlearning (MU) has attracted attention, aiming to forget arbitrary information from AI models. However, the existing MU responds to a deletion request for specific data points in the AI model, and it is difficult to respond to a deletion request for a specific concept in the LLM (e.g., a person's name). This paper proposes a new MU requirement called Concept Unlearning (CU) to make LLMs forget arbitrary concepts from the perspective of a knowledge graph. This will allow us to define forgetting in terms of "knowledge", which is more intuitive to humans, and allow us to design effective methods for LLM forgetting. We also propose a method to realize CU by generating appropriate token sequences using LLMs and applying gradient ascent on the generated token sequences. The effectiveness of our method is confirmed by the dataset created from Wikipedia and LLM-as-a-Judge.

Submission Number: 44

Loading