Reasoning in Token Economies: Budget-Aware Evaluation of LLM Reasoning StrategiesDownload PDF

Anonymous

16 Dec 2023ACL ARR 2023 December Blind SubmissionReaders: Everyone
Abstract: Large Language Models (LLMs) have ushered in a diverse array of reasoning strategies, each with unique computational requirements. Traditional evaluations that focus solely on performance metrics miss a key factor: the increased effectiveness due to scale. By overlooking this aspect, a skewed view of strategy efficiency is often presented. This paper introduces a framework that incorporates the compute budget into the evaluation process, providing a more informative comparison that takes into account both performance metrics and computational cost. Our scale-aware investigation reveals a strong correlation between performance and compute budget, showing that simple strategies like Self-Consistency (SC) can outperform more complex methods when scale is considered. We further explore the impact of two specific types of budgets: answer generation and evaluation, highlighting the significant role of self-evaluation in performance enhancement for certain reasoning strategies. We also propose Self-Confidence-weighted Self-Consistency ($SC^2$) as a new baseline and identify a correlation between model calibration and success in self-evaluation-based strategies. These findings open doors for more efficient budget utilization and may spur the development of more robust and cost-effective reasoning strategies and LLM applications.
Paper Type: long
Research Area: Generation
Contribution Types: Model analysis & interpretability, Reproduction study
Languages Studied: English
0 Replies

Loading