Keywords: vision-and-language navigation, efficient evaluation
TL;DR: Evaluation in the real world is often time-consuming and expensive, so we propose a targeted contrast set-based evaluation strategy to efficiently evaluate the linguistic and visual capabilities of an end-to-end VLN policy.
Abstract: Evaluations in the real world are time-consuming, and the relatively small number of experiments that can realistically be run may not explain the performance on the combinatorially large space of instructions that language can specify in complex scenes. In this work, we provide the first real-world evaluation of the Vision-and-Language Navigation in Continuous Environments (VLN-CE) task, a benchmark for evaluating language-guided navigation in simulation. To address the challenges of real-world evaluation in VLN-CE, we propose key desiderata for efficiently evaluating the linguistic and visual components of end-to-end robot policies. We introduce a contrast set-based evaluation based on our proposed criteria that strategically modify test instructions and scenes to efficiently gain component-level insights about a language-guided policy. We hope to spark discussion with the community on efficient evaluation of language-guided policies to bring these robots closer to real-world deployment.
Submission Number: 46
Loading