Slow the Dialogue, Not Just the Robot: Positive Friction for Reliable Grounding and Safe, Embodied Vision-Language Action

Slow the Dialogue, Not Just the Robot: Positive Friction for Reliable Grounding and Safe, Embodied Vision-Language Action

ACL ARR 2026 January Submission6627 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: dialogue system, ambiguity, robots, vision language action, grounding

Abstract: Embodied conversational robots must translate underspecified natural language commands into physical actions where mistakes can be costly or irreversible. Current LLM-based robot systems often act immediately: guessing missing referents, spatial relations, or motion constraints, leading to task failures and safety risks. As a response to this, we present PONDER, a dialogue architecture that operationalizes positive friction for embodied interaction: when the current visual context admits multiple plausible interpretations, the system inserts targeted clarification questions, explicit assumption statements, or brief confirmation pauses before execution. PONDER runs on a Misty II mobile robot, integrating speech input, a vision-language model, and conversational memory with navigation and perception actions. We carry out a user study, where positive friction increases task success from 18.8% to 89.6% and improves user ratings from 1.29 to 3.85 (5-point scale), at the cost of only an average of 1.14 additional dialogue turns. In addition, we verify our results in a simulated setup across diverse ambiguity types, where PONDER achieves 74.8% success versus 60.3% without friction and substantially outperforms zero-shot baselines (37.8–44.8%). We release an open-source Misty II implementation and our synthetic dialogue dataset to support reproducible research on embodied dialogue.

Paper Type: Long

Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond

Research Area Keywords: dialogue system, ambiguity, robots, vision language action, grounding

Contribution Types: NLP engineering experiment, Data resources, Data analysis

Languages Studied: English

Submission Number: 6627

Loading