Keywords: Scaling Laws, Embodied AI
Abstract: We introduce an observational method to derive scaling laws for LLM performance in embodied decision-making tasks, allowing us to predict embodied skills, quantify simulation gaps, and algorithm intervention. In contrast to conventional scaling research that trains multiple models from scratch at different scales, our approach bypasses new model training and instead
Uses publicly available pretrained LLMs to model performance trends across different model families and sizes. Constructing such unified scaling law across diverse model families is challenging, as these models differ in both training compute efficiency and resulting capabilities. We address this by employing a generalized scaling framework that expresses model performance as a function of a low-dimensional capability space. We first validate such scaling law on the Embodied Agent Interface (EAI) benchmark across 125 LLMs, confirming a predictive accuracy that represents at least a 50\% improvement over traditional compute scaling laws. We then find that an LLM's decision-making ability is highly predictable—accurately forecasting the performance of larger models using data from those as small as 40B parameters—which allows us to quantify both the performance gap between simulation environments and the impact of structured decoding.
Supplementary Material: zip
Primary Area: applications to robotics, autonomy, planning
Submission Number: 4512
Loading