Surrogate-Augmented Deception in Reinforcement Learning (SAD-RL)
Keywords: Reinforcement Learning, Multiagent Systems, Adversarial Learning, Deception, Opponent Modeling, Surrogate Models
TL;DR: SAD-RL trains reinforcement learning agents to mislead adaptive opponents by penalizing surrogate prediction accuracy, achieving robust deception across adversarial environments while maintaining strong task performance.
Abstract: Reinforcement learning (RL) agents in adversarial environments risk being modeled and exploited by opponents that infer their goals or policies. We introduce Surrogate-Augmented Deception in Reinforcement Learning (SAD-RL), a framework that trains agents to resist such modeling by embedding a surrogate predictor into the learning loop and penalizing its accuracy. Rather than emphasizing mere unpredictability, SAD-RL promotes strategic opacity---learning behaviors that remain effective while defying opponent inference. We evaluate SAD-RL in two representative domains: a discrete Adversarial Grid World (AGW) and a continuous Sharks and Minnows (SaM) pursuit–evasion task. Across both settings, SAD-RL agents maintain high task performance while exhibiting measurable deception against surrogate models, achieving a better trade-off between effectiveness and opacity than conventional RL agents. We further analyze the trade-off between goal achievement and opacity, identifying distinct modes of balanced and over-deceptive behavior. Together, these results establish SAD-RL as a general and domain-agnostic approach for inducing emergent deception in reinforcement learning.
Area: Learning and Adaptation (LEARN)
Generative A I: I acknowledge that I have read and will follow this policy.
Submission Number: 1569
Loading