Keywords: Machine Unlearning; Offline Reinforcement Learning
Abstract: Diffusion policies have recently advanced offline reinforcement learning (RL) by enabling expressive and multi-modal action generation.
As these models move closer to real applications, it becomes important to remove the influence of specific data, either for privacy reasons, to eliminate unsafe behaviors, or to meet regulatory requirements. Existing unlearning methods, however, cannot handle diffusion-based policies because training influence is spread across the denoising process and reinforced by critic values. In this paper, we present Relative Fisher Forgetting (RFF), the first framework for unlearning in diffusion-based offline RL. RFF removes unwanted data influence through two complementary components: actor unlearning with noise aware influence gradients that are scaled by relative Fisher importance, and critic unlearning that suppresses value estimates for forgotten trajectories. To ensure stability, RFF alternates actor and critic updates and introduces gradient norm control, retain set regularization, and convergence monitoring. Experiments on MuJoCo control benchmarks for both single-task and multi-task settings show that RFF reliably removes designated trajectories and behaviors while preserving performance on retained data, outperforming retraining and prior unlearning baselines in efficacy and efficiency.
Supplementary Material: pdf
Primary Area: reinforcement learning
Submission Number: 6242
Loading