Deliberation Meets Reaction: A Dual-Expert VLA framework for Autonomous Driving

20 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: autonomous driving, VLM
Abstract: Vision-Language-Action (VLA) models have emerged as a promising paradigm for end-to-end autonomous driving due to their remarkable interpretability and generalization. However, their practical deployment is severely hindered by substantial computational costs and high inference latency. This challenge stems from (1) a large number of model parameters for maintaining world knowledge and (2) intensive Chain-of-Thought (CoT) reasoning for improving driving performance. Inspired by the observation that experienced drivers usually only engage in intensive deliberation in unfamiliar or complex situations, we propose an adaptive dual-expert VLA model, termed DE-Driver, to adaptively select activated experts and reduce unnecessary reasoning. Specifically, DE-Driver integrates a lightweight reactive expert for swift responses and a powerful deliberative expert for complex reasoning. Depending on the scenario, a scene-aware router dynamically directs layer-wise features to the appropriate expert. Then, these selected experts determine whether to generate CoT reasoning, ensuring a balance between inference efficiency and driving performance. Experimental results on the closed-loop Bench2Drive benchmark show that DE-Driver achieves driving performance on par with state-of-the-art methods while significantly improving inference efficiency.
Supplementary Material: zip
Primary Area: applications to robotics, autonomy, planning
Submission Number: 23338
Loading