Keywords: autonomous driving, vision-language-action model, efficient infernece
Abstract: While the recent Alpamayo1 model sets a new baseline for Vision-Language-Action (VLA) models in autonomous driving, its significant inference latency precludes deployment on edge devices. In this work, we systematically analyze performance bottlenecks across each inference stage (encode, prefill, decode, and action) of Alpamayo1-10B, revealing that the model suffers from severe spatial redundancy. To bridge this gap, we propose FlashDriveVLA, an algorithm-system co-design framework that comprehensively addresses the efficiency bottlenecks at each stage. FlashDriveVLA reduces end-to-end latency from 769.2 ms to 158.2 ms (4.9x speedup), successfully bringing the autonomous driving VLA closer to real-time inference on edge hardware.
Submission Number: 83
Loading