Keywords: R1-Zero-Like Training, GUI Agents, Grounding
TL;DR: This paper investigates what kind of R1-Zero-like training is suitable for grounding tasks in GUI agents.
Abstract: Recent Graphical User Interface (GUI) agents replicate the R1-Zero paradigm, coupling online Reinforcement Learning (RL) with explicit chain-of-thought reasoning prior to object grounding and thereby achieving substantial performance gains. In this paper, we first conduct extensive analysis experiments of three key components of that training pipeline: input design, output evaluation, and policy update—each revealing distinct challenges arising from blindly applying general-purpose RL without adapting to GUI grounding tasks. Input design: Current templates encourage the model to generate chain-of-thought reasoning, but longer chains unexpectedly lead to worse grounding performance. Output evaluation: Reward functions based on hit signals or box area allow models to exploit box size, leading to reward hacking and poor localization quality. Policy update: Online RL tends to overfit easy examples due to biases in length and sample difficulty, leading to under-optimization on harder cases. To address these issues, we propose three targeted solutions. First, we adopt a $\textbf{Fast Thinking Template}$ that encourages direct answer generation, reducing excessive reasoning during training. Second, we incorporate a box size constraint into the reward function to mitigate reward hacking. Third, we revise the RL objective by adjusting length normalization and adding a difficulty-aware scaling factor, enabling better optimization on hard samples. Our $\textbf{GUI-G1-3B}$, trained on 17K public samples with Qwen2.5-VL-3B-Instruct, achieves $\textbf{90.3\%}$ accuracy on ScreenSpot and $\textbf{37.1\%}$ on ScreenSpot-Pro. This surpasses all prior models of similar size and even outperforms the larger UI-TARS-7B, establishing a new state-of-the-art in GUI agent grounding.
Supplementary Material:  zip
Primary Area: Other (please use sparingly, only use the keyword field for more details)
Submission Number: 22787
Loading