Task Completion Agents are Not Ideal Collaborators

Shannon Zejiang Shen; Valerie Chen; Ken Gu; Alexis Ross; Zixian Ma; Jillian Ross; Alex Gu; Chenglei Si; Wayne Chi; Andi Peng; Jocelyn J Shen; Ameet Talwalkar; Tongshuang Wu; David Sontag

Task Completion Agents are Not Ideal Collaborators

Shannon Zejiang Shen, Valerie Chen, Ken Gu, Alexis Ross, Zixian Ma, Jillian Ross, Alex Gu, Chenglei Si, Wayne Chi, Andi Peng, Jocelyn J Shen, Ameet Talwalkar, Tongshuang Wu, David Sontag

Published: 06 Oct 2025, Last Modified: 04 Nov 2025MTI-LLM @ NeurIPS 2025 SpotlightEveryoneRevisionsBibTeXCC BY-ND 4.0

Keywords: Language Model based Agents, Human AI Collaboration, Agent Evaluation, Multi-turn Interaction

Abstract: Large Language Model (LLM) agents are increasingly capable of handling complex tasks autonomously, but current development and evaluation practices remain centered around one-shot task completion. This dominant paradigm fails to account for the inherently iterative and collaborative nature of many real-world problems, where human goals are often underspecified and evolve over time. This position paper argues for a shift in focus: from building and assessing task completion agents to developing \emph{collaborative agents} --- those evaluated not just by the quality of their final outputs, but by how well they engage with and enhance human effort throughout the problem-solving process. To support this shift, we introduce \textbf{collaborative effort scaling}, a framework that captures how an agent's utility grows with increasing user involvement. Through case studies and simulated evaluations, we show that state-of-the-art agents often underperform in multi-turn, real-world scenarios, revealing a missing ingredient in agent design: the ability to sustain engagement and scaffold user understanding. Collaborative effort scaling offers a new lens for diagnosing agent behavior and guiding development toward deeper, more adaptive interaction.

Submission Number: 190

Loading