Improving LLM Reasoning under Uncertainty with Coach-Player Multi-agent

Published: 16 Oct 2025, Last Modified: 10 Nov 2025NeurIPS 2025 ER WorkshopEveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLM, Reasoning, Multi-agent, Collaboration
Abstract: Large language models (LLM) are increasingly deployed as interactive agents that reason, plan, and act across multiple turns. While reasoning-based paradigms improve problem solving, and reflection-based methods enhance robustness, single-agent designs remain limited: they often reinforce their own biases and intervene only after errors have accumulated, leading to instability in long-horizon tasks. We introduce a Player–Coach multi-agent LLM system that enables conditional, real-time self-correction during multi-turn decision making. A Player agent engages with the environment through step-wise reasoning and actions, while a Coach agent is invoked only when the Player’s uncertainty is high. The Coach provides targeted metacognitive feedback, such as clarifying objectives or resolving confusion. Feedback is immediately added to the Player’s context. To trigger interventions precisely when needed, we develop a composite uncertainty measure combining normalized entropy and top-two margin. Experiments on ALFWorld show that our approach improves success rates and reduces trajectory length compared to single-agent baselines, without requiring additional training or fine-tuning.
Submission Number: 61
Loading