Abstract: Identifying underlying user goals and intents has been recognized as valuable in various settings, such as personalized agents, improved search responses, advertising, user analytics and more. In this paper we propose leveraging an additional signal for identifying user intents, namely by observing users' interactions within UI environments. To that end, we introduce the task of goal identification from observed UI trajectories, aiming to infer the user's intended task based on their UI interactions. We propose a novel evaluation metric to assess whether two task descriptions are paraphrases within a specific UI environment. By Leveraging the inverse relation with the UI automation task, we utilized Android and web datasets for our experiments. Using our metric and these datasets, we conducted experiments comparing the performance of humans and state-of-the-art models, specifically GPT-4 and Gemini-1.5 Pro. Our results demonstrate that both Gemini and GPT underperform compared to humans, highlighting significant room for improvement.
Paper Type: Short
Research Area: NLP Applications
Research Area Keywords: UI Automation, LLM, Multimodality, Intent Identification, Autonomous UI agents
Contribution Types: Model analysis & interpretability, Position papers
Languages Studied: English
Submission Number: 73
Loading