Keywords: Task-oriented dialogue, MCTS, LLM
Abstract: Proactive task-oriented dialogue is essential for reliable real-world assistants, yet current LLM-based systems are largely reactive and struggle to recover from failed retrieval, ambiguous constraints, and cross-domain dependencies. This paper targets the core gap of enabling effective lookahead decision-making for proactive TOD under limited high-quality supervision and high-variance language rollouts. We propose SMCTS-TOD , which combines act-level open-loop planning with a fast learned success estimator to make lookahead practical: the planner searches over dialogue-act sequences while the Value-LLM provides low-latency, low-variance guidance and supports iterative refinement via self-distillation. Across MultiWOZ 2.0 and SGD, SMCTS-TOD improves interactive goal completion and robustness, achieving higher success-oriented metrics than strong prompting baselines. Human studies further indicate better dialogue-level usefulness and fewer unreasonable strategy choices. These results suggest that abstract planning paired with fast learned evaluation is a viable and verifiable path to more proactive and robust LLM-based TOD agents.
Paper Type: Long
Research Area: Dialogue and Interactive Systems
Research Area Keywords: task-oriented, dialogue
Contribution Types: Approaches to low-resource settings, Approaches low compute settings-efficiency
Languages Studied: English
Submission Number: 5525
Loading