Keywords: LLM Reasoning, LLM Efficiency
Abstract: Allocating more compute to large language models (LLMs) reasoning has generally been demonstrated to improve their effectiveness, but also results in increased inference time. In contrast, humans can
perform tasks faster and better with increased experience and exposure. Hence, this paper aims to investigate
the question: Can LLMs also become faster at reasoning through recurrent exposure on relevant tasks, and if
so, how can it be achieved? To address these questions, we first formalize the problem setting of LLM reasoning
speedup systematically in the dimensions of task similarity and compute budget calculation. We then propose
SpeedupLLM, a theoretically guaranteed framework to implement and benchmark such reasoning speedup
behaviour based on adaptive compute allocation and memory mechanisms. We further conduct comprehensive
experiments to benchmark such behaviour across different reasoning tasks, question similarity levels, memory methods, and
reasoning methods. Results show that LLMs can generally reason faster with past experience, achieving up to a
56\% reduction in compute cost when equipped with appropriate memory and reasoning methods.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 13732
Loading