OS-Catalyst: Advancing Computer-Using Agents Efficiency through Adaptive Action Compression

Xinfeng Yuan; Qiushi Sun; Yinghao Chen; Rui Li; Xuetian Chen; Siyu Yuan; Xintao Wang; Zichen Ding; Zonglin Li; Biqing Qi; Deqing Yang

OS-Catalyst: Advancing Computer-Using Agents Efficiency through Adaptive Action Compression

Xinfeng Yuan, Qiushi Sun, Yinghao Chen, Rui Li, Xuetian Chen, Siyu Yuan, Xintao Wang, Zichen Ding, Zonglin Li, Biqing Qi, Deqing Yang

08 Sept 2025 (modified: 06 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: computer use, GUI agent, task efficiency

Abstract: Driven by advances in Vision-Language Models (VLMs), computer-using agents have recently demonstrated remarkable capabilities in complex reasoning, software control, and the automation of digital workflows. However, the existing step-by-step paradigm requires extensive interaction with the model, and the resulting query latency emerges as a key bottleneck for real-world adoption. To address this limitation, we propose that agents should be able to output a sequence of actions after each observation, enabling efficient execution without constant model queries. In this work, we introduce \ours, a method that transforms standard computer-using models into agents with the capability of action sequence prediction. To enable this, we design a data collection pipeline tailored for compressed action trajectories in computer-using environments. Building on this pipeline, we construct a large-scale dataset within the WorkArena benchmark and train computer-using agents for action sequence prediction. Through extensive experiments, we show that OS-Catalyst enables up to 50\% faster task completion on office-related benchmarks without sacrificing success rate.

Supplementary Material: zip

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 3176

Loading