Keywords: LLM Agents, Token Cost
Abstract: LLM-based multi-agent (LLM-MA) systems have demonstrated potential in complex tasks such as reasoning and code generation. However, compared to single-agent systems, LLM-MA systems incur significantly higher inference latency and token costs due to repeated LLM calls. In this work, we identify duplicated tokens as a major contributor to these inefficiencies, acting as a "communication tax" that hinders scalability. To systematically analyze token duplication patterns, we propose AgentTaxo, a taxonomy that categorizes agent roles into Planner, Reasoner, and Verifier across various applications. AgentTaxo dissects inter-agent communication and identifies redundant reasoning results frequently reused for validation. We benchmark and analyze token costs in popular LLM-MA systems, quantifying the impact of this communication tax through experimental evaluation. Our findings provide insights into optimizing efficiency and scalability in LLM-MA architectures.
Submission Number: 18
Loading