Keywords: Reinforcement Learning, Privacy Preservation, Adversarial Learning, Curriculum Learning, Behavioural Leakage
TL;DR: This paper introduces PALADIN,a proactive,adversary-in-the-loop Reinforcement Learning framework that embeds a learnable leakage estimator and curriculum-guided reward shaping to jointly optimise task performance and behavioural privacy.
Abstract: Agents trained via \gls{rl} and deployed in sensitive settings, such as finance, autonomous driving, or healthcare, risk leaking private information through their observable behaviour. Even without access to raw data or model parameters, a passive adversary may infer sensitive attributes (e.g., identity, location) by observing the agent’s trajectory. We empirically study this \emph{behavioural leakage} threat and propose \textbf{PALADIN}, a proactive privacy-shaping framework that integrates an adversarial inference model into the training loop. PALADIN jointly trains a transformation network to perturb observations and a co-adaptive leakage predictor, whose output shapes the agent’s reward via a curriculum-guided penalty. This allows the agent to first learn stable task policies, then progressively adapt its behaviour to resist inference.
We evaluate PALADIN on an \gls{av} and a \gls{ft} benchmarks. We audited leakage with held-out adversaries (MLP, GRU, and Transformer) using multiple metrics (confidence, negative log-likelihood, F1, and AUROC). On AV--GPS, PALADIN achieves strong privacy--utility improvements where in a representative MLP--Transformer setting, it increases return from $25.3$ to $40.9$ while reducing Attack~F1 from $0.96$ to $0.14$ and sharply lowering adversary confidence. On the financial benchmark, gains are smaller but still positive, for example, in a GRU--Transformer setting, PALADIN increases return from $0.005$ to $0.041$ while slightly reducing Attack~F1 from $0.75$ to $0.73$ and improving leak\_nll. Overall, our results show that behaviour-aware, curriculum-guided shaping is highly effective for reducing behavioural leakage in AV control and offers a principled, empirically robust alternative to \gls{dp}-style methods for other sequential decision problems.
Primary Area: reinforcement learning
Submission Number: 21884
Loading