λ: A Benchmark for Data-Efficiency in Long-Horizon Indoor Mobile Manipulation Robotics

Ahmed Jaafar; Shreyas Sundara Raman; Yichen Wei; Sofia Juliani; Anneke Wernerfelt; Benedict Quartey; Ifrah Idrees; Jason Xinyu Liu; Stefanie Tellex

λ: A Benchmark for Data-Efficiency in Long-Horizon Indoor Mobile Manipulation Robotics

Ahmed Jaafar, Shreyas Sundara Raman, Yichen Wei, Sofia Juliani, Anneke Wernerfelt, Benedict Quartey, Ifrah Idrees, Jason Xinyu Liu, Stefanie Tellex

Published: 01 Jan 2024, Last Modified: 16 May 2025CoRR 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Learning to execute long-horizon mobile manipulation tasks is crucial for advancing robotics in household and workplace settings. However, current approaches are typically data-inefficient, underscoring the need for improved models that require realistically sized benchmarks to evaluate their efficiency. To address this, we introduce the LAMBDA ({\lambda}) benchmark-Long-horizon Actions for Mobile-manipulation Benchmarking of Directed Activities-which evaluates the data efficiency of models on language-conditioned, long-horizon, multi-room, multi-floor, pick-and-place tasks using a dataset of manageable size, more feasible for collection. Our benchmark includes 571 human-collected demonstrations that provide realism and diversity in simulated and real-world settings. Unlike planner-generated data, these trajectories offer natural variability and replay-verifiability, ensuring robust learning and evaluation. We leverage LAMBDA to benchmark current end-to-end learning methods and a modular neuro-symbolic approaches that combines foundation models with task and motion planning. We find that end-to-end methods-even when pretrained-yield lower success rates, while neuro-symbolic methods perform significantly better and require less data.

Loading