EvoCUA: Evolving Computer Use Agents via Learning from Scalable Synthetic Experience

Published: 23 May 2026, Last Modified: 23 May 2026ICML 2026 AIWILDEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Computer Use Agents, Verifiable Synthesis, Learning from Experience, Vision Language Model, GUI Agent
TL;DR: We introduce EvoCUA, which establishes a new open-source SOTA on the OSWorld benchmark with a 56.7% success rate by leveraging a self-sustaining evolutionary paradigm that optimizes agents through scalable, verifiable synthetic experience.
Abstract: The development of native computer-use agents (CUA) represents a significant leap in multimodal AI. However, their potential is currently bottlenecked by the constraints of static data scaling. Existing paradigms relying primarily on passive imitation of static datasets struggle to capture the intricate causal dynamics inherent in long-horizon computer tasks. In this work, we introduce EvoCUA, a native agentic model that integrates data generation and policy optimization into a self-sustaining evolutionary cycle. This approach employs a verifiable synthesis engine to autonomously generate diverse tasks with executable validators, alongside a scalable infrastructure orchestrating tens of thousands of sandbox rollouts for mass experience acquisition. To internalize this experience, our iterative evolving learning strategy reinforces successful routines while transforming failure trajectories into rich supervision through error analysis and self-correction. EvoCUA achieves a 56.7% success rate on the OSWorld benchmark, establishing a new open-source state-of-the-art. It outperforms the previous best open-source model, OpenCUA-72B (45.0%), and surpasses leading closed-weights models such as UI-TARS-2 (53.1%). These results demonstrate the generalizability of the evolving paradigm across foundation models of varying scales, establishing a robust and scalable path for advancing native agent capabilities.
Track: Regular Paper (9 pages)
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 22
Loading