PALADIN: Privacy-Aware Learning with Adversary-Detection and INference suppression.

ICLR 2026 Conference Submission21884 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Reinforcement Learning, Privacy Preservation, Adversarial Learning, Curriculum Learning, Behavioural Leakage
TL;DR: This paper introduces PALADIN,a proactive,adversary-in-the-loop Reinforcement Learning framework that embeds a learnable leakage estimator and curriculum-guided reward shaping to jointly optimise task performance and behavioural privacy.
Abstract: Agents trained via \gls{rl} and deployed in sensitive settings, such as finance, autonomous driving, or healthcare, risk leaking private information through their observable behaviour. Even without access to raw data or model parameters, a passive adversary may infer sensitive attributes (e.g., identity, location) by observing the agent’s trajectory. We formalise this \emph{behavioural leakage} threat and propose \textbf{PALADIN}, a proactive privacy-shaping framework that integrates an adversarial inference model into the training loop. PALADIN jointly trains a transformation network to perturb observations and a co-adaptive leakage predictor, whose output shapes the agent’s reward via a curriculum-guided penalty. This allows the agent to first learn stable task policies, then progressively adapt its behaviour to resist inference. We evaluate PALADIN on autonomous navigation and financial trading, auditing leakage against multiple adversary architectures (MLP, GRU, Transformer). PALADIN achieves up to 43\% (return 27.0 vs. 18.9 baseline) higher task returns and 57\% (0.056 vs 0.131) lower adversarial leakage compared to strong baselines. Even against Transformer adversaries, where leakage confidence remains high, PALADIN raises returns by 38\% (22.8 vs.15.9) without amplifying leakage, whereas static noise and \gls{dp} baselines (returns less than 7) fail to reduce leakage. These results highlight the value of embedding adversary-aware privacy shaping directly into \gls{rl} training to mitigate deployment-stage inference threats.
Primary Area: reinforcement learning
Submission Number: 21884
Loading