Keywords: agent, token, cost, efficient
Abstract: The deployment of large language model (LLM)-powered agents for knowledge-intensive and reasoning tasks is often prohibitively expensive, since processing large volumes of evidence incurs massive token costs. Existing techniques such as prompt compression and model routing attempt to reduce token usage, but they often compromise accuracy or fail to capture the fine-grained structure of reasoning tasks. In this work, we introduce E-Agent, a cost-effective framework that leverages the pricing asymmetry of LLMs to significantly reduce monetary cost without sacrificing performance. E-Agent adopts an executor–verifier paradigm: multiple small, locally deployed models act as executors to generate candidate answers, which are then verified by a powerful cloud-based model. This design shifts token consumption from expensive outputs to relatively cheaper inputs. The framework further supports specialized workflows for both retrieval-augmented generation (RAG) and non-RAG tasks, and employs structured outputs to minimize candidate answer length. Experiments on GSM8K, ALFWorld, HotpotQA, and StrategyQA demonstrate that E-Agent reduces token usage by 10\%–50\% compared with strong baselines, while maintaining or even improving accuracy.
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 18095
Loading