Keywords: reasoning, refinement
TL;DR: We propose a reasoning with a refinement strategy where necessary questions are asked to decide when an LLM should refine its output.
Abstract: Large Language Models (LLMs) have demonstrated remarkable generative abilities, but can they judge the quality of their generations and self-improve?
A popular concept, referred to as *self-refinement*, postulates that LLMs can detect and correct the errors in their generations when asked to do so. However, recent empirical evidence points in the opposite direction, suggesting that LLMs often struggle to accurately identify errors when reasoning is involved. To address this, we propose a reasoning with a refinement strategy called *ART*, which *asks* necessary questions to decide when an LLM should *refine* its output, and uses it to affirm or deny *trust* in its refinement by ranking the refinement and the initial prediction. On two multistep reasoning tasks of mathematical word problems (GSM8K) and question answering (StrategyQA), *ART* achieves a performance gain of +5 points over self-refinement baselines, while using a much smaller model as the decision maker.
We believe that *ART* with smaller models, making refinement decisions can be a cost-effective alternative to fine-tuning LLMs.
Submission Number: 26
Loading