Difficulty-Aware Reasoning for Mobile GUI Automation via Reinforcement Fine-Tuning

Jiafu Chen; Rui Lv; Hongyi Jing; Ziqiang Dang; Shuo Fang; Chenguang Ma; Lei Zhao; Jiajie Teng

Difficulty-Aware Reasoning for Mobile GUI Automation via Reinforcement Fine-Tuning

Jiafu Chen, Rui Lv, Hongyi Jing, Ziqiang Dang, Shuo Fang, Chenguang Ma, Lei Zhao, Jiajie Teng

16 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: GUI agent, reinforcement fine-tuning

Abstract: Automating GUI tasks remains challenging due to layout complexity, element density, and intent ambiguity, which requires effective and efficient reasoning to facilitate each operation. Existing agents typically employ a uniform chain-of-thought (CoT) reasoning process for all actions, a one-size-fits-all approach that incurs unnecessary computational overhead and even performance degradation on trivial steps. To address this, we introduce \textbf{AdaGUI-R1}, a GUI agent that pioneers a difficulty-aware reasoning paradigm by dynamically modulating its reasoning depth based on action complexity. Our methodology consists of reasoning inducing and reasoning enhancing. During reasoning inducing, we introduce a self-supervised mechanism to generate high-quality, difficulty-aware reasoning trajectories. Fine-tuning on this curated data endows the agent with the fundamental capability to adjust its reasoning depth according to action complexity. Subsequently, Group Adaptive Policy Optimization (GAPO) algorithm is implemented to enhance reasoning performance. It leverages an adaptive thought reward to encourage thinking on challenging steps, and a novel exploration reward with a difficulty-aware Gaussian bandwidth to improve action accuracy.Extensive experiments demonstrate that AdaGUI-R1 sets a new state-of-the-art. It concurrently reduces unnecessary reasoning tokens by 40% while improving action accuracy by 5%, underscoring the power of adaptive reasoning in GUI automation.

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 6643

Loading