Rethinking Unlearning for Large Reasoning Models

Published: 11 Jun 2025, Last Modified: 11 Jun 2025MUGen @ ICML 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Unlearning; Large reasoning model
TL;DR: Rethinking unlearning for large reasoning models
Abstract: Recent advances in large reasoning models (LRMs) have enabled strong multi-step reasoning, but existing unlearning methods, designed for standard LLMs, fail to address the unique challenges of LRMs. We present the first systematic study of LRM unlearning and show that conventional methods often leave reasoning traces intact, despite removing final answers. To overcome this, we propose **R**easoning-aware **R**epresentation **M**isdirection for **U**nlearning($R^2$MU), which suppresses sensitive reasoning traces while preserving general reasoning ability. Experiments show that ($R^2$MU) significantly reduces reasoning leakage and performs well on both reasoning and safety benchmarks, offering the first principled solution for mitigating reasoning trace leakage in LRM unlearning.
Submission Number: 39
Loading