Submission Track: Paper Track (up to 8 pages)
Keywords: computer use agent, knowledge, multimodal
TL;DR: To resolve the gap between external knowledge and practical computer use environment, we propose our UI-Evol by an evolution mechanism based on learning from recorded trajectory.
Abstract: External knowledge has played a crucial role in the recent development of computer-use agents.
We identify a critical knowledge-execution gap: retrieved knowledge often fails to translate into effective real-world task execution. Our analysis shows even 90\% correct knowledge yields only 41\% execution success rate.
To bridge this gap, we propose $\textbf{UI-Evol}$, a plug-and-play module for autonomous GUI knowledge evolution. UI-Evol consists of two stages: a $\textit{Retrace Stage}$ that extracts faithful objective action sequences from actual agent-environment interactions, and a $\textit{Critique Stage}$ that refines existing knowledge by comparing these sequences against external references.
We conduct comprehensive experiments on the OSWorld benchmark with the state-of-the-art Agent S2.
Our results demonstrate that UI-Evol not only significantly boosts task performance but also addresses a previously overlooked issue of high behavioral standard deviation in computer use agents, leading to superior performance on computer use tasks and substantially improved agent reliability.
Submission Number: 43
Loading