I'm not familiar with the name Harry Potter: Prompting Baselines for Unlearning in LLMs

Pratiksha Thaker; Yash Maurya; Virginia Smith

I'm not familiar with the name Harry Potter: Prompting Baselines for Unlearning in LLMs

Pratiksha Thaker, Yash Maurya, Virginia Smith

Published: 04 Mar 2024, Last Modified: 14 Apr 2024SeT LLM @ ICLR 2024EveryoneRevisionsBibTeXCC BY 4.0

Keywords: unlearning, LLM, prompting

TL;DR: We show that prompt-based approaches can perform comparably to fine-tuning on recent LLM unlearning benchmarks and discuss some of the implications of this finding for unlearning pipelines and evaluation.

Abstract: Recent work has demonstrated that fine-tuning is a promising approach to `unlearn' concepts from large language models. However, fine-tuning can be expensive, as it requires both generating a set of examples and running iterations of fine-tuning to update the model. In this work, we show that simple prompting approaches can achieve unlearning results comparable to fine-tuning methods. We recommend that researchers investigate prompting as a lightweight baseline when evaluating the performance of more computationally intensive fine-tuning approaches. While we do not claim that prompting is a universal solution to the problem of unlearning, our work suggests the need for evaluation metrics that can better separate the power of prompting and fine-tuning, and highlights scenarios where prompting itself may be useful for unlearning, such as in generating examples for fine-tuning or unlearning when only API access is available.

Submission Number: 87

Loading