Abstract: The benchmarks from previous International Planning Competitions are commonly used to evaluate new planning algorithms. Since this set has grown organically over the years, it has several flaws: it contains duplicate tasks, trivially solvable domains, unsolvable tasks and tasks with modelling errors. Also, different domain sizes complicate the aggregation of results. Most importantly, however, the range of task difficulty is very small in many domains. We propose an automated method for creating benchmarks that solves these issues. To find a good scaling in difficulty, we automatically configure the parameters of benchmarks domains. We show that the resulting benchmark set improves empirical comparisons by allowing to differentiate between planners more easily.
Keywords: benchmarks, evaluation of solvers
7 Replies
Loading