RoboCasa365: A Large-Scale Simulation Framework for Training and Benchmarking Generalist Robots

ICLR 2026 Conference Submission22486 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Robot Datasets and Benchmarking, Vision-Language-Action Models, Robot Simulation
TL;DR: RoboCasa365 is a large-scale benchmark of 365 everyday tasks that advances the study and evaluation of generalist robots across diverse environments and data.
Abstract: Recent advances in robot learning have accelerated progress toward generalist robots that can operate across diverse tasks and environments. Yet despite this momentum, it remains difficult to gauge how close we are to this goal, as the field lacks a reproducible, large-scale benchmark for systematic evaluation. To address this gap, we present RoboCasa365, a comprehensive robot simulation benchmark for everyday tasks. Built on the RoboCasa platform, RoboCasa365 introduces 365 everyday tasks across 2,500 diverse kitchen environments, and over 2,000 hours of robot interaction data, making it one of the most diverse and large-scale resources for studying generalist policies. We design the benchmark to support evaluation across key settings, including multi-task learning, robot foundation model training, and lifelong learning. We present extensive experiments with state-of-the-art methods and analyze how task diversity, dataset scale, and environment variation shape generalization. Our results provide new insights into what factors most strongly affect the performance of generalist robots and help inform strategies for future progress in the field.
Primary Area: datasets and benchmarks
Submission Number: 22486
Loading