How Do AI Agents Spend Your Money? Analyzing and Predicting Token Consumption in Agentic Coding Tasks

Longju Bai; Zhemin Huang; Xingyao Wang; Jiao Sun; Rada Mihalcea; Erik Brynjolfsson; Alex Pentland; Jiaxin Pei

How Do AI Agents Spend Your Money? Analyzing and Predicting Token Consumption in Agentic Coding Tasks

Longju Bai, Zhemin Huang, Xingyao Wang, Jiao Sun, Rada Mihalcea, Erik Brynjolfsson, Alex Pentland, Jiaxin Pei

Published: 28 Apr 2026, Last Modified: 28 Apr 2026MSLD 2026 PosterEveryoneRevisionsCC BY 4.0

Keywords: AI agents; LLM token consumption; agentic coding; token efficiency

TL;DR: Agentic coding is highly token-intensive, with large cost variability and no guarantee that more tokens improve accuracy. Models also underestimate their own token usage.

Abstract: Wide adoption of AI agents in complex human workflows drives rapid growth of LLM token consumption.1 When agents are deployed on tasks that can require a large amount of tokens, three questions naturally arise: (1) Where do AI Agents spend the tokens? (2) What models are more token efficient? (3) Can LLMs anticipate the token usage before task execution? In this paper, we present the first quantitative study of token consumption patterns in agentic coding. We analyze trajectories from 8 frontier LLMs on SWE-Bench and evaluate models’ ability to predict their own token costs before task execution. We find that: (1) agentic tasks are uniquely expensive: it consumes substantially more tokens (and cost) than code reasoning and code chat, and input tokens are the key driver of the overall cost, instead of output tokens. (2) Token usage is highly variable and inherently stochastic: runs on the same task can differ by up to 30× in total tokens, and higher token usage does not translate to higher accuracy; instead, accuracy often peaks at intermediate cost and degrades at higher cost. (3) Model-to-model token efficiency is governed more by model characteristics than by human-labeled task difficulty, and difficulty labels only weakly align with actual resource expenditure. (4) Finally, frontier models fail to accurately predict their token usage (with weak-to-moderate correlations), and systematically underestimate the real token costs. Our study reveals important insights regarding the economics of AI agents and could inspire new studies in this direction.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 171

Loading