Adversarial Attacks through Value-Guided Transition Modeling in Deep Reinforcement Learning

Thomas O'Cuilleanain; Juan Cardenas-Cartagena; Matthia Sabatelli

Adversarial Attacks through Value-Guided Transition Modeling in Deep Reinforcement Learning

Thomas O'Cuilleanain, Juan Cardenas-Cartagena, Matthia Sabatelli

Published: 05 Nov 2025, Last Modified: 05 Nov 2025NLDL 2026 AbstractsEveryoneRevisionsBibTeXCC BY 4.0

Keywords: deep reinforcement learning, adversarial attacks, observation perturbation, test-time attacks, strategically-timed attacks

TL;DR: We propose a value-guided adversarial attack for DRL agents that leverages the victim policy’s value function to target critical states, matching prior results in the literature on Pong with about half as many attacks.

Abstract: Efficient adversarial attacks on deep reinforcement learning agents rely on identifying critical states. Prior work uses learned transition models with environment-specific metrics to predict and lure the victim agent to such states. We propose a value-guided attack that integrates the victim policy’s value function as an environment-agnostic metric into both transition model training and state evaluation. From our preliminary results in the Pong environment from the Arcade Learning Environment, our method achieves comparable performance degradation to prior work while requiring roughly half as many attacks.

Serve As Reviewer: ~Juan_Cardenas-Cartagena1

Submission Number: 6

Loading