Composing Efficient, Robust Tests for Policy Selection

Dustin Morrill; Thomas Walsh; Daniel Hernandez; Peter R. Wurman; Peter Stone

Composing Efficient, Robust Tests for Policy Selection

Dustin Morrill, Thomas Walsh, Daniel Hernandez, Peter R. Wurman, Peter Stone

Published: 08 May 2023, Last Modified: 26 Jun 2023UAI 2023Readers: Everyone

Keywords: reinforcement learning (RL), test construction, robustness, optimization

TL;DR: This paper introduces RPOSST, an algorithm for composing efficient, robust, reusable tests of candidate deployment RL policies by selecting a small number of the most useful test cases.

Abstract: Modern reinforcement learning systems produce many high-quality policies throughout the learning process. However, to choose which policy to actually deploy in the real world, they must be tested under an intractable number of environmental conditions. We introduce RPOSST, an algorithm to select a small set of test cases from a larger pool based on a relatively small number of sample evaluations. RPOSST treats the test case selection problem as a two-player game and optimizes a solution with provable $k$-of-$N$ robustness, bounding the error relative to a test that used all the test cases in the pool. Empirical results demonstrate that RPOSST finds a small set of test cases that identify high quality policies in a toy one-shot game, poker datasets, and a high-fidelity racing simulator.

Supplementary Material: pdf

Other Supplementary Material: zip

0 Replies

Loading