OS-Catalyst: Advancing Computer-Using Agents Efficiency through Adaptive Action Compression

OS-Catalyst: Advancing Computer-Using Agents Efficiency through Adaptive Action Compression

ACL ARR 2026 January Submission2815 Authors

03 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Computer-using agent

Abstract: Driven by advances in Vision-Language Models (VLMs), computer-using agents have recently demonstrated remarkable capabilities in complex reasoning, software control, and the automation of digital workflows. However, the existing step-by-step paradigm requires extensive interaction with the model, and the resulting query latency emerges as a key bottleneck for real-world adoption. To address this limitation, we propose that agents should be able to output a sequence of actions after each observation, enabling efficient execution without constant model queries. In this work, we introduce \ours, a method that transforms standard computer-using models into agents with the capability of action sequence prediction. To enable this, we design a data collection pipeline tailored for compressed action trajectories in computer-using environments. Building on this pipeline, we construct a large-scale dataset within the WorkArena benchmark and train computer-using agents for action sequence prediction. Through extensive experiments, we show that OS-Catalyst enables up to 50% faster task completion on office-related benchmarks without sacrificing success rate.

Paper Type: Long

Research Area: AI/LLM Agents

Research Area Keywords: Computer-using agent

Languages Studied: English

Submission Number: 2815

Loading