Efficient Agent Training for Computer Use

Yanheng He; Jiahe Jin; Pengfei Liu

Efficient Agent Training for Computer Use

Yanheng He, Jiahe Jin, Pengfei Liu

Published: 26 Jan 2026, Last Modified: 11 Apr 2026ICLR 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Agents; Computer Use; Large Language Models; Vision Language Models

TL;DR: PC Agent-E demonstrates efficient agent training with a small set of human trajectories augmented with Claude 3.7 Sonnet, achieving 141% improvement and surpassing Claude 3.7 Sonnet by 10%.

Abstract: Scaling up high-quality trajectory data has long been a critical bottleneck for developing human-like computer use agents. We introduce PC Agent-E, an efficient agent training framework that significantly reduces reliance on large-scale human demonstrations. Starting with just 312 human-annotated computer use trajectories, we further augment them by synthesizing diverse alternative action decisions with Claude 3.7 Sonnet. Trained on these enriched trajectories, our PC Agent-E model achieved a remarkable 141% relative improvement, and even surpassed the Claude 3.7 Sonnet by 10% in relative terms on WindowsAgentArena-V2, an improved benchmark we also released. By integrating robust human computer use skills with automated AI data synthesis capabilities, our method not only brought substantial improvements over training on human trajectories alone, but also significantly surpassed direct distillation from Claude 3.7 Sonnet.

Supplementary Material: pdf

Primary Area: foundation or frontier models, including LLMs

Submission Number: 15462

Loading