CortexVLA: Bridging the Gap between Cognition and Action via Function Calling

CortexVLA: Bridging the Gap between Cognition and Action via Function Calling

ICLR 2026 Conference Submission17143 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Vision-Language-Action Model, Robot Manipulation, Tool Learning, Large Language Model

TL;DR: A modular VLA framework that leverages LLM tool-calling to achieve robust and scalable ultra-long-horizon task execution with failure recovery.

Abstract: Vision-Language-Action (VLA) models have shown promise for embodied intelligence, but they often struggle with long-horizon tasks due to error accumulation or planning failure. To address these challenges, we propose CortexVLA, a novel paradigm that bridges cognition and action by leveraging large language model (LLM) function calling. CortexVLA consists of three modular components: the Central Cortex, an LLM-based cognitive hub for planning and function calling; the Visual Cortex, which provides perception through callable vision tools; and the Motor Cortex, which exposes robotic action control as functions. To improve robustness and enable recovery from execution errors, we further propose Cortex-PPO, a reinforcement learning (RL) algorithm that trains CortexVLA to make optimal function calls while supporting failure recovery. We provide theoretical analyses to further demonstrate the soundness and generalization abilities of Cortex-PPO. Comprehensive experiments demonstrate the effectiveness of CortexVLA on ultra-long-horizon tasks. In our main experiment, CortexVLA achieves an average success rate of 85.40\%. More importantly, it sustains a 72.73\% success rate with an average sub-task length of 11.55 when tackling the most challenging 14 sub-tasks, whereas end-to-end VLA baselines fail beyond 3 or 4 steps. In a flexible manufacturing scenario with 31 sub-tasks, CortexVLA achieves an 81.25\% success rate with an average sub-task length of 26.69, demonstrating strong scalability and adaptability. Codes will be released after publication.

Primary Area: applications to robotics, autonomy, planning

Submission Number: 17143

Loading