Beyond End-to-End Models: Characterizing the Favorable Scaling of Coordinated Perception and Control
Keywords: Scaling Law; Online Evolutive Learning; Model Complexity
Abstract: End-to-end models have become a dominant paradigm for learning in embodied agents, directly mapping raw sensory inputs to control outputs. However, their tightly coupled nature often leads to unfavorable scaling properties: as model size and environmental complexity increase, their computational and sample efficiency drastically deteriorates, posing a barrier to sustainable, long-term deployments in the real world. In this work, we move beyond the end-to-end approach and systematically characterize the scaling laws of an alternative paradigm: a coordinated architecture that decouples perception and control. In this architecture, perception and decision-making are handled by distinct agents that interact through a closed-loop feedback mechanism, where task-level feedback from control agents continuously guides the online evolutive learning of the perception modules. We conduct the first empirical study that investigates the computational scaling of these two paradigms across two critical dimensions: increasing model parameter scale and escalating environmental complexity. Our experiments show that the coordinated perception and control architecture achieves nearly linear or sub-linear growth (scaling exponents $\alpha \approx 1.11$ for parameters and $\delta \approx 0.95$ for complexity), while end-to-end models exhibit super-linear or exponential growth ($\alpha \approx 1.27$, $\delta \approx 2.70$). These results demonstrate that coordinated perception and control offers a fundamentally more scalable and robust design, providing practical guidance for building next-generation embodied AI systems that are both adaptive and computationally efficient.
Supplementary Material: zip
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 23594
Loading