Imagined Memorisation: Training-Data Leakage in Model-Based RL World Models

Published: 26 May 2026, Last Modified: 26 May 2026ICML 2026 FoGen Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Memorisation, World Models, Model-Based Reinforcement Learning, Membership Inference, Privacy
TL;DR: We audit memorisation in MBRL world models. Reconstruction-based MIA reaches AUC=0.999 on IRIS/Ms. Pac-Man where loss-based MIA fails — consistent with leakage concentrating in the decoder, not the likelihood surface.
Abstract: Model-based reinforcement learning (MBRL) agents such as DreamerV3 and IRIS train a \emph{world model} on replay-buffer trajectories and then optimise their policies inside their ``imagination.'' We present the first systematic membership-inference audit of MBRL world models, adapting three attack families (trajectory reconstruction, dynamics-loss MIA, and adversarial-action divergence) to the action-conditioned generative setting. We test for leakage across DreamerV3 and IRIS on four Atari games. On the strongest configuration---IRIS / Ms.\ Pac-Man---reconstruction attains AUC$=0.999$ with Cohen's $d=-4.76$ at horizon $H{=}30$, and TPR$=0.98$ at $1\%$ FPR, exceeding signals typically reported for language and diffusion models; on DreamerV3 / Krull, reconstruction (AUC$=0.682$) and adversarial divergence ($p<10^{-10}$) independently corroborate membership. Nonetheless, the attack families can disagree sharply: on the IRIS / Ms.\ Pac-Man checkpoint that yields near-perfect reconstruction, loss-MIA flags zero members at the same $1\%$-FPR threshold, and five of the eight loss-MIA evaluations score below random. We attribute this disagreement to collection-policy state-space mismatch between members and non-members, which swamps likelihood-based scores while leaving pixel-level signals intact. The implication is that memorisation in pixel-generative world models concentrates in the decoder pathway---the inverse of the language-model setting in which loss-based MIA is the standard tool.
Submission Number: 171
Loading