Improving Small and Large Language Models Alignment on Chain-of-Thought Reasoning using Curriculum Learning

ACL ARR 2024 April Submission56 Authors

11 Apr 2024 (modified: 17 May 2024)ACL ARR 2024 April SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Chain-of-Thought (CoT) prompting empowers the reasoning abilities of Large Language Models (LLMs), eliciting them to solve complex reasoning tasks step-by-step. However, these capabilities appear only in models with billions of parameters, which represent a barrier to entry for many users who are forced to operate on a smaller model scale, i.e., Small Language Models (SLMs). Although many companies are releasing LLMs of the same family with a reduced number of parameters, these models sometimes produce misleading answers and are unable to deliver CoT reasoning. In this paper, we propose a method to enable CoT reasoning over SLMs by introducing two novel mechanisms. First, we propose aligning CoT abilities via Instruction-tuning with the support of CoT Demonstrations "taught" by LLMs teacher to SLMs students. Second, we use Curriculum Learning, a pedagogically motivated learning method that empowers the Instruction-tuning phase. Hence, we analyze the impact on the downstream abilities of four question-answering benchmarks. The results show that SMLs can be instructed to reason via Demonstration produced by LLMs. We move a step further in research: conceiving SLMs as human learners, we expose them to a CL teaching-based approach, obtaining better results on downstream performances.
Paper Type: Long
Research Area: Efficient/Low-Resource Methods for NLP
Research Area Keywords: Efficent Instruction-tuning Small Language Models
Contribution Types: Model analysis & interpretability, Approaches to low-resource settings, Approaches low compute settings-efficiency, Publicly available software and/or pre-trained models, Data resources, Data analysis
Languages Studied: English
Section 2 Permission To Publish Peer Reviewers Content Agreement: Authors grant permission for ACL to publish peer reviewers' content
Submission Number: 56
Loading