Enhancing UI Element Recognition with a Dual-System Approach through Dynamic Code Generation

Enhancing UI Element Recognition with a Dual-System Approach through Dynamic Code Generation

ACL ARR 2026 January Submission3539 Authors

04 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: UI recognition, Cognitive modeling, AI agents, Generative code, Grounding

Abstract: Grounding is central to AI agents on smartphones and requires recognizing relevant UI elements on graphical interfaces. However, existing grounding methods typically prioritize either efficiency or accuracy, and struggle to balance both under real-world UI variations. To address this challenge, we adopt a dual-system approach: System 1 efficiently recognizes UI elements using predefined rules, while System 2 provides deeper analytical reasoning when System 1 fails. To bridge the two systems, we propose GroundCoder, a multi-agent system that extracts representative UI features (e.g., visual appearance and layout) based on System 2’s reasoning and generates executable code for System 1. The generated code transfers System 2’s analytical capabilities to System 1 by encoding them as executable rules, enabling fast and efficient recognition of UI elements beyond predefined patterns. To systematically evaluate our approach, we construct Eleva, a dataset of UI elements collected from popular mobile applications, covering diverse devices, display modes, and application modes. Experiments on Eleva show that our method preserves efficiency comparable to rule-based methods while improving recognition accuracy by 34.6% over existing mainstream methods. We further discuss implications for using generative code in UI recognition to support more robust grounding in dynamic mobile environments.

Paper Type: Long

Research Area: Linguistic theories, Cognitive Modeling and Psycholinguistics

Research Area Keywords: Human-Computer Interaction, cross-modal information extraction, AI agents

Contribution Types: Approaches low compute settings-efficiency, Data resources

Languages Studied: English, Chinese

Submission Number: 3539

Loading