Keywords: action policies, policy testing
Abstract: Testing was recently proposed as a method to gain trust in learned
action policies in classical planning. Test cases in this setting are
states generated by a fuzzing process that performs random walks from
the initial state. A fuzzing bias attempts to bias these random walks
towards policy bugs, that is, states where the policy performs
sub-optimally. Prior work explored a simple fuzzing bias based on
policy-trace cost. Here, we investigate this topic more deeply. We
introduce three new fuzzing biases based on analyses of policy-trace
shape, estimating whether a trace is close to looping back on
itself, whether it contains detours, and whether its goal-distance
surface does not smoothly decline. Our experiments with two kinds of
neural action policies show that these new biases improve bug-finding
capabilities in many cases.
Category: Short
Student: Graduate
Supplemtary Material: pdf
Submission Number: 220
Loading