Rational Metareasoning for Large Language Models

C. Nicolò De Sabbata; Theodore Sumers; Thomas L. Griffiths

Rational Metareasoning for Large Language Models

C. Nicolò De Sabbata, Theodore Sumers, Thomas L. Griffiths

Published: 10 Oct 2024, Last Modified: 20 Oct 2024NeurIPS 2024 Workshop on Behavioral MLEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM, Metareasoning, Problem solving, Chain of Thought, Inference Optimization, Value of Computation

TL;DR: This paper presents a method inspired by cognitive science to iteratively train LLMs to optimize reasoning processes, significantly cutting costs without sacrificing performance.

Abstract: Reasoning has emerged as a core technique for improving large language model (LLM) performance across various tasks by using additional inference-time compute. However, as LLMs scale in both size and usage, inference costs are becoming increasingly burdensome. How, then, might we optimize the cost-performance tradeoff of reasoning? This work introduces a novel approach based on computational models of metareasoning used in cognitive science, training LLMs to selectively use intermediate reasoning steps only when necessary. We first develop a reward function that incorporates the Value of Computation by penalizing unnecessary reasoning, then use this reward function with Expert Iteration to train the LLM. Compared to few-shot chain-of-thought prompting, our approach significantly reduces inference costs (47\% fewer tokens generated on average) without sacrificing task performance across diverse datasets.

Submission Number: 28

Loading