Collaborator or Assistant? How AI Coding Agents Partition Work Across Pull Request Lifecycles

Published: 28 Mar 2026, Last Modified: 08 May 2026AIware 2026EveryoneRevisionsCC BY 4.0
Keywords: AI coding agents, human-AI collaboration, software engineering, pull request workflows, empirical study
TL;DR: We analyze 29k+ PRs across five AI coding tools and find that agents mainly initiate work while humans keep merge authority.
Abstract: When AI coding agents open branches and submit pull requests (PRs), two questions shape oversight design: who starts the work, meaning operational agency, and who authorizes its completion, meaning merge governance. We characterize tools along a Collaborator--Assistant spectrum based on how they redistribute initiative, oversight, and endorsement, while merge governance remains predominantly human across five tools: OpenAI, Copilot, Devin, Cursor, and Claude Code. We analyze 29,585 PR lifecycles using an Initiator $\times$ Approver taxonomy with six interaction scenarios. Lifecycle reconstruction shows how these roles unfold over time. Collaborator tools, including Cursor, Devin, and Copilot, concentrate operational initiative in agents that open and carry PR work forward, while humans retain review and endorsement on the path to merge. Assistant tools, including OpenAI and Claude, leave task direction primarily with humans and provide bounded support within human-led workflows. Across the spectrum, agency and governance decouple: Collaborator workflows are $\geq$96\% agent-initiated, yet terminal merge authority remains almost exclusively human, with agent-classified approvers confined to a small fraction of PRs. Where automation executes a merge, logs record the executor but not the decision-maker, marking a boundary of observation. We contribute the taxonomy, per-tool state machines, and a replication package for research on automation, oversight, and governance in PR workflows.
Revision Summary: Since the original February 15 submission, we revised the paper to make the framing, empirical interpretation, and validity boundaries clearer. The core dataset and main empirical result remain the same: we analyze 29,585 pull request lifecycles across five AI coding-agent tools and show that tools differ in how they distribute initiation, review, and merge authority. The final version clarifies the distinction between operational agency, governance authority, endorsement, and accountability. It explains that Collaborator workflows concentrate operational initiative in agents, while humans still retain review, endorsement, and merge governance. It also clarifies that endorsement means meaningful acceptance of a pull request outcome, while accountability refers to who remains responsible after the outcome enters the codebase. We also sharpened the interpretation of RQ3. The original version noted that event logs show who executed a merge, but not necessarily who made the decision. The final version extends this boundary to review events as well: logs can show that a review occurred, but not how substantive or meaningful that review was. Finally, we expanded the threats to validity by adding a discussion of endorsement measurement and content-level controls. These additions make the paper more precise about what lifecycle logs can support and identify future work on review content, PR complexity, and task type.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public.
Paper Type: Full-length papers (i.e. case studies, theoretical, applied research papers). 8 pages
Reroute: true
Submission Number: 50
Loading