Abstract: We propose a novel approach to the evaluation of agent policies in uncertain sequential decision making problems. We study a model-based two-player differential game with the first player being the agent of interest and the other player being a disturbance player may act against the agent. In particular, we focus on the problems where tail events are critical. Here, robustness of the policy against the disturbance actions must be guaranteed. We present a framework which relies upon backward reachable sets computed by solving the differential game with respect to the disturbance player. The disturbance action is modeled and learned as a set-valued mapping, rather than a deterministic or probabilistic policy. The solution is disturbance winning set (B) where a predefined metric is violated under all possible policies. By sampling test cases from the complement of B, we obtain challenging scenarios that can help evaluating robustness of policies. We demonstrate our framework in a simple autonomous driving example where an adaptive cruise control policy in a car-following scenario is evaluated. Our approach to the synthesis of realistic and challenging test cases can help to systematically evaluate the robustness and safety of policies.