K²-Agent: Co-Evolving Know-What and Know-How for Hierarchical Mobile Device Control

Published: 26 Jan 2026, Last Modified: 11 Feb 2026ICLR 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLM/VLM Agent, Self-Evolving Agent, Mobile Device Control, Decision-Making
Abstract: Existing mobile device control agents often perform poorly when solving complex tasks requiring long-horizon planning and precise operations, typically due to a lack of relevant task experience or unfamiliarity with skill execution. We propose $\textbf{K²-Agent}$, a hierarchical framework that models human-like cognition by separating and co-evolving declarative ("knowing what") and procedural ("knowing how") knowledge for planning and execution. K²-Agent’s high level reasoner is bootstrapped from a single demonstration per task and runs a Summarize–Reflect–Locate–Revise (SRLR) loop to distill and iteratively refine task-level declarative knowledge through self-evolution. The low-level executor is trained with our curriculum-guided Group Relative Policy Optimization (C-GRPO), which (i) constructs a balanced sample pool using decoupled reward signals and (ii) employs dynamic demonstration injection to guide the model in autonomously generating successful trajectories for training. On the challenging AndroidWorld benchmark, K$^2$-Agent achieves a new $\textbf{state of the art}$ with $\textbf{76.1\% success rate}$, ranking $\textbf{1st}$ among all methods $\textbf{using only raw screenshots and open-source backbones}$. Furthermore, K²-Agent shows powerful dual generalization: its high-level declarative knowledge transfers across diverse base models, while its low-level procedural skills achieve competitive performance on unseen tasks in ScreenSpot-v2 and Android-in-the-Wild (AitW).
Supplementary Material: zip
Primary Area: applications to robotics, autonomy, planning
Submission Number: 981
Loading