## Efficient Adversarial Training without Attacking: Worst-Case-Aware Robust Reinforcement Learning

TL;DR: We propose a strong and efficient robust training framework for RL, WocaR-RL, that directly estimates and optimizes the worst-case reward of a policy under bounded $\ell_p$ attacks without requiring extra samples for learning an attacker.