Shaping Smart Personal Assistants through Generative Interactive Environments for Scalable Design and Evaluation

Ziyi Xuan; Yiwen Wu; Vinod Namboodiri; Yu Yang

Shaping Smart Personal Assistants through Generative Interactive Environments for Scalable Design and Evaluation

Ziyi Xuan, Yiwen Wu, Vinod Namboodiri, Yu Yang

Published: 28 Sept 2025, Last Modified: 18 Oct 2025SEA @ NeurIPS 2025 OralEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Conversational Agent, Intelligent Personal Assistant, Intelligent User Simulation, Simulation-based Experimentation, Large Language Models, Human-Computer Interaction

TL;DR: GIDEA uses LLM-based agents to simulate human-assistant interactions for scalable smart personal assistant research, achieving 0.85 semantic similarity with original human studies while reducing evaluation time from months to days.

Abstract: Designing and evaluating smart personal assistants remains difficult due to resource-intensive human subject requirements, privacy concerns, and complex experimental setups that restrict scalability and reproducibility. Existing simulation platforms often depend on scripted behaviors, which fail to capture the adaptive and personalized interactions that effective assistants require. We introduce GIDEA, a generative simulation platform that leverages LLM-based agents to model realistic human behaviors and interaction dynamics in smart assistant studies. The platform enables systematic scaling of experiments by modularly encoding participants, environments, and protocols into structured LLM prompts. Its design supports rapid iteration across study conditions and integrates Unity-based visualization with virtual reality support for controlled, reproducible experimentation. To demonstrate scalability, we replicate ten published studies on assistant agent design, achieving an average semantic similarity of 0.85 with the original findings. Results show that generative agents approximate human-like responses and can reproduce key outcomes of human-subject experiments. By supporting iterative and large-scale experimentation, GIDEA provides a cost-effective framework for evaluating emergent assistant capabilities, including adaptive reasoning, preference learning, and multi-user coordination.

Archival Option: The authors of this submission do *not* want it to appear in the archival proceedings.

Submission Number: 138

Loading