Large Language Models Can Self-Improve

Published: 07 Oct 2023, Last Modified: 01 Dec 2023EMNLP 2023 MainEveryoneRevisionsBibTeX
Submission Type: Regular Long Paper
Submission Track: Commonsense Reasoning
Submission Track 2: Machine Learning for NLP
Keywords: Large Language Models, Commonsense Reasoning, Arithmetic Reasoning, Chain of Thought
TL;DR: We demonstrate that the reasoning ability of Large Language Models can be improved through self-generating rationale-augmented answers for fine-tuning, leading to significant performance gains in various reasoning tasks without external supervision.
Abstract: Large Language Models (LLMs) have achieved excellent performances in various tasks. However, fine-tuning an LLM requires extensive supervision. Human, on the other hand, may improve their reasoning abilities by self-thinking without external inputs. In this work, we demonstrate that an LLM is also capable of self-improving with only unlabeled datasets. We use a pre-trained LLM to generate ``high-confidence'' rationale-augmented answers for unlabeled questions using Chain-of-Though (CoT) prompting and self-consistency, and fine-tune the LLM using those self-generated solutions as target outputs. We show that without any ground truth label, our approach improves the general reasoning ability of a 540B-parameter LLM (74.4\%$\rightarrow$82.1\% on GSM8K, 90.0\%$\rightarrow$94.4\% on OpenBookQA, and 63.4\%$\rightarrow$67.9\% on ANLI-A3) and can also be adapted to extreme low-resource cases where even training questions and CoT prompts are limited. We conduct ablation studies and show that fine-tuning on diverse reasoning paths is critical for self-improvement.
Submission Number: 4277
Loading