Keywords: deep reinforcement learning, adversarial attacks, observation perturbation, test-time attacks, strategically-timed attacks
TL;DR: We propose a value-guided adversarial attack for DRL agents that leverages the victim policy’s value function to target critical states, matching prior results in the literature on Pong with about half as many attacks.
Abstract: Efficient adversarial attacks on deep reinforcement learning agents rely on identifying critical states. Prior work uses learned transition models with environment-specific metrics to predict and lure the victim agent to such states. We propose a value-guided attack that integrates the victim policy’s value function as an environment-agnostic metric into both transition model training and state evaluation. From our preliminary results in the Pong environment from the Arcade Learning Environment, our method achieves comparable performance degradation to prior work while requiring roughly half as many attacks.
Serve As Reviewer: ~Juan_Cardenas-Cartagena1
Submission Number: 6
Loading