Task Completion Agents are Not Ideal Collaborators

Published: 06 Oct 2025, Last Modified: 04 Nov 2025MTI-LLM @ NeurIPS 2025 SpotlightEveryoneRevisionsBibTeXCC BY-ND 4.0
Keywords: Language Model based Agents, Human AI Collaboration, Agent Evaluation, Multi-turn Interaction
Abstract: Large Language Model (LLM) agents are increasingly capable of handling complex tasks autonomously, but current development and evaluation practices remain centered around one-shot task completion. This dominant paradigm fails to account for the inherently iterative and collaborative nature of many real-world problems, where human goals are often underspecified and evolve over time. This position paper argues for a shift in focus: from building and assessing task completion agents to developing \emph{collaborative agents} --- those evaluated not just by the quality of their final outputs, but by how well they engage with and enhance human effort throughout the problem-solving process. To support this shift, we introduce \textbf{collaborative effort scaling}, a framework that captures how an agent's utility grows with increasing user involvement. Through case studies and simulated evaluations, we show that state-of-the-art agents often underperform in multi-turn, real-world scenarios, revealing a missing ingredient in agent design: the ability to sustain engagement and scaffold user understanding. Collaborative effort scaling offers a new lens for diagnosing agent behavior and guiding development toward deeper, more adaptive interaction.
Submission Number: 190
Loading