A Study on Improving Reasoning in Language Models

Published: 27 Oct 2023, Last Modified: 24 Apr 2024ICBINB 2023EveryoneRevisionsBibTeX
Keywords: reasoning, self-improving language models, reinforcement learning, iterative supervised learning, reward-conditional supervised learning
TL;DR: We compare methods for improving reasoning in smaller language models: filtered supervised learning, reinforcement learning, and distillation.
Abstract: Accurately carrying out complex reasoning is a crucial component of deployable and reliable language models. While current language models can exhibit this capability with few-shot guidance, accurate reasoning is primarily restricted to larger model sizes. In this work, we explore methods for improving the reasoning capabilities of smaller language models which are more deployable than their larger counterparts. Specifically, we look at variations of supervised learning, online reinforcement learning with PPO, and distillation from larger models. Surprisingly, for reasoning tasks such as CommonsenseQA and GSM8K, we find that simple filtered supervised learning often outperforms reward-conditioned supervised learning, and that simpler iterative supervised learning performs on par with online reinforcement learning.
Submission Number: 34
Loading