Incorporating Token Usage into Prompting Strategy Evaluation

Incorporating Token Usage into Prompting Strategy Evaluation

TMLR Paper5758 Authors

28 Aug 2025 (modified: 29 Nov 2025)Decision pending for TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: In recent years, large language models have demonstrated remarkable performance across diverse tasks. However, their task effectiveness is heavily dependent on the prompting strategy used to elicit output, which can vary widely in both performance and token usage. While task performance is often used to determine prompting strategy success, we argue that efficiency—balancing performance and token usage—can be a more practical metric for real-world utility. To enable this, we propose Big-$O_{tok}$, a theoretical framework for describing the token usage growth of prompting strategies, and analyze Token Cost, an empirical measure of tokens per performance. We apply these to several common prompting strategies to demonstrate their utility and observe that increased token usage leads to drastically diminishing performance returns. Our results validate the Big-$O_{tok}$ and Token Cost analyses and reinforce the need for efficiency-aware evaluations.

Submission Type: Regular submission (no more than 12 pages of main content)

Previous TMLR Submission Url: https://openreview.net/forum?id=TOvI1r6FIr

Changes Since Last Submission: * Removed non-anonymous GitHub link. We are very sorry for this oversight on our part * Used parenthetical and textual in-text citations more consistently

Assigned Action Editor: ~Atsushi_Nitanda1

Submission Number: 5758

Loading