Aligning Large Language Models via Chain-of-Thought Reasoning

Leonardo Ranaldi, Andre Freitas

22 Jan 2024OpenReview Archive Direct UploadReaders: Everyone

Abstract: Chain-of-Thought (CoT) prompting empowers the reasoning abilities of Large Language Models (LLMs), eliciting them to solve complex reasoning tasks step-by-step. However, these capabilities appear only in models with billions of parameters, which represent a barrier to entry for many users who are forced to operate on a smaller model scale, i.e., Small Language Models (SLMs). Although many companies are releasing LLMs of the same family with a reduced number of parameters, these models sometimes produce misleading answers and are unable to deliver CoT reasoning. In this paper, we investigate the alignment of reasoning abilities from larger to smaller Language Models. In particular, using Instruction-tuning-CoT approach, that is, an Instruction-tuning empowered towards CoT-Demonstrations, we analyze the impact on the the downstream abilities. Hence, we instruct a smaller Language Model using outputs generated by more robust models belonging to the same family or not, and we analyze the impact and divergencies. Results obtained on four question-answering benchmarks show that SMLs can be instructed to reason via CoT-Demonstration produced by LLMs.

0 Replies