ReTrace: Reinforcement Learning-Guided Reconstruction Attacks on Machine Unlearning

ReTrace: Reinforcement Learning-Guided Reconstruction Attacks on Machine Unlearning

ICLR 2026 Conference Submission14618 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Machine Unlearning, Reinforcement Learning, Reconstruction Attack

Abstract: Machine unlearning has emerged as an inevitable AI mechanism to support GDPR requirements such as revoking user consent through the "right to be forgotten". However, existing approaches often leave residual traces that make them vulnerable to data reconstruction attacks. In this work, we propose ReTrace, the first reconstruction attack framework that uniquely formulates unlearned data recovery on large-scale deep architectures as a reinforcement learning (RL) problem. By treating residual unlearning traces as reward signals, ReTrace guides a generator to actively explore the input space and converge toward the forgotten data distribution. This RL-guided approach enables both instance-level recovery of individual samples and distribution-level reconstruction of unlearned classes. We provide a theoretical foundation showing that the RL objective converges to an exponential-tilted distribution that amplifies forgotten regions. Empirically, ReTrace achieves up to 73.1\% instance-level recovery and reduces FID and KL scores beyond state-of-the-art baselines, UIA (IEEE S\&P 2024) and HRec (NeurIPS 2024). Strikingly, on the challenging task of text unlearning, it improves BLEU scores by nearly 100\% over black-box baselines while preserving distributional fidelity, demonstrating that RL can recover even high-dimensional and structured modalities. Furthermore, ReTrace demonstrates effectiveness across both convolutional (ResNet) and transformer-based models, with Distil-BERT as the largest architecture attacked to date. These results show that current unlearning methods remain vulnerable, highlighting the need for robust and provably private mechanisms.

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 14618

Loading