Rethinking Unlearning for Large Reasoning Models

Changsheng Wang; Chongyu Fan; Yihua Zhang; Jinghan Jia; Dennis Wei; Parikshit Ram; Nathalie Baracaldo; Sijia Liu

Rethinking Unlearning for Large Reasoning Models

Changsheng Wang, Chongyu Fan, Yihua Zhang, Jinghan Jia, Dennis Wei, Parikshit Ram, Nathalie Baracaldo, Sijia Liu

Published: 11 Jun 2025, Last Modified: 04 Jul 2025MUGen @ ICML 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Unlearning; Large reasoning model

TL;DR: Rethinking unlearning for large reasoning models

Abstract: Recent advances in large reasoning models (LRMs) have enabled strong multi-step reasoning, but existing unlearning methods, designed for standard LLMs, fail to address the unique challenges of LRMs. We present the first systematic study of LRM unlearning and show that conventional methods often leave reasoning traces intact, despite removing final answers. To overcome this, we propose **R**easoning-aware **R**epresentation **M**isdirection for **U**nlearning($R^2$MU), which suppresses sensitive reasoning traces while preserving general reasoning ability. Experiments show that ($R^2$MU) significantly reduces reasoning leakage and performs well on both reasoning and safety benchmarks, offering the first principled solution for mitigating reasoning trace leakage in LRM unlearning.

Submission Number: 39

Loading