Submission Track: Paper Track (up to 8 pages)
Keywords: Intent Modeling, Human-Computer Interaction, Early Intent Recognition, GUI Agents, Inverse Planning, Theory of Mind, Latent Goal Inference
Abstract: Understanding user intent is essential for building better human interaction agents, as it enables personalization, co-creation, and contextual adaptation. However, existing approaches are either restricted to text environments, use human annotation, or just predict future user actions lacking the ability to reason explicitly about user goals. In this work, we introduce EARL (Early Action Reasoning for Latent intent), a theory of mind inspired inference-time algorithm that models user intent as an inverse planning problem, inferring latent goals from observed user actions. EARL hypothesizes potential user intent at multiple stages during the course of task execution, enabling timely intervention and personalization. Evaluated on three diverse benchmarks namely Mind2Web, AiTz, and VideoGUI, and using two strong LLMs (Gemini-1.5-Pro and GPT-4o), we show that EARL consistently outperforms CoT-based LLM baselines in accurately deciphering user intent, especially under partial observations.
Submission Number: 27
Loading