Demo Abstract: Human Strategy Meets AI Execution: An LLM-Driven Gaming Agent

Published: 01 Jan 2025, Last Modified: 18 Jul 2025SenSys 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: We introduce an intelligent mobile agent that leverages large language models (LLMs) and computer vision to interpret user commands and autonomously interact with smartphone applications. This agent continuously captures and analyzes screen content, executes actions such as taps, swipes, and text inputs, and intelligently handles ambiguous situations by prompting users for clarification. To advance this vision, we first develop a prototype focused on automating interactions in low-frame-rate mobile games like 2048 and tic-tac-toe. By taking user-defined strategies as input, the agent automates game interactions, effectively separating strategic decision-making from physical touch-based inputs. This enhances accessibility for users who cannot physically interact with a phone and for those who prefer focusing on strategy rather than execution.
Loading