CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning

Zeyi Sun; Yuhang Cao; Jianze Liang; Qiushi Sun; Ziyu Liu; Zhixiong Zhang; Yuhang Zang; Xiaoyi Dong; Kai Chen; Dahua Lin; Jiaqi Wang

CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning

Zeyi Sun, Yuhang Cao, Jianze Liang, Qiushi Sun, Ziyu Liu, Zhixiong Zhang, Yuhang Zang, Xiaoyi Dong, Kai Chen, Dahua Lin, Jiaqi Wang

09 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: LVLMs, agent, computer use agent

TL;DR: We introduce CODA, a framework that uses decoupled reinforcement learning to train specialist planners by separating high-level planning from low-level execution, then merges their knowledge to create a powerful generalist agent.

Abstract: Autonomous agents for Graphical User Interfaces (GUIs) face significant challenges in novel software, require both long-horizon planning with software domain knowledge and precise, fine-grained execution. Existing approaches suffer from a trade-off: generalist agents excel at planning but falter in execution, while specialized agents show the opposite weakness. While recent compositional frameworks attempt to bridge this gap by combining a "planner" and an "actor", they are typically static and non-trainable, preventing adaptation from experience—a critical limitation given the scarcity of high-quality data in novel software. To address these limitations, we introduce CODA, a novel and trainable compositional framework that synergizes a generalist planner (Cerebrum) with a specialist executor (Cerebellum), trained with a dedicated two-stage training pipeline. The first stage, Specialization, employs a decoupled GRPO approach to train an expert planner for each novel software individually. The second stage, Generalization, aggregates all positive trajectories from all specialized experts. This consolidated, high-quality dataset is then used to perform supervised fine-tuning (SFT) on the final planner, equipping it with the robust, cross-domain capabilities of a generalist. Evaluated ScienceBoard benchmark with diversified novel softwares, our framework significantly outperforms the baseline and establishes a new state-of-the-art SOTA among open-source models with strong generalizability to novel software and unseen executor like code agent. All the code and models will be made publicly available to foster further research.

Supplementary Material: pdf

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 3273

Loading