Transferring Reasoning Capabilities between LLMs operating via Curriculum Learning Policy

TMLR Paper4228 Authors

17 Feb 2025 (modified: 13 Mar 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: In-context reasoning methods, exemplified by Chain-of-Thought (CoT) (et alia.,) empower the reasoning abilities of large language models (LLMs), eliciting them to solve complex reasoning tasks step-by-step. Nevertheless, the capacities to deliver robust CoT explanations arise only in models with billions of parameters, representing a barrier to entry for many users forced to operate on a smaller model scale, i.e., Small Language Models (SLMs). Even though many companies are releasing LLMs of the same family with a reduced number of parameters, these models sometimes produce misleading answers and are unable to deliver accurate step-wise reasoned answers. This paper proposes a method to transfer step-wise reasoning over SLMs by operating via Instruction-tuning (IT) on synthetic demonstrations delivered in a pedagogically motivated manner. In particular, firstly, we propose aligning step-wise reasoning capabilities via IT using Demonstrations "taught" by LLMs teacher to SLMs students. Then, we operate via Curriculum Learning, a pedagogically motivated learning method that improves the IT phase. We analyse the impact on the downstream performances of four question-answering benchmarks. The results show that SMLs can be instructed to reason via Demonstrations delivered by LLMs. We move a step further in research: conceiving SLMs as human learners, we expose them to a CL teaching-based approach, obtaining better results on downstream performances.
Submission Length: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=3nUbt6O24d
Changes Since Last Submission: Dear Editors, We are sending the correct version of our contirbution. It was previously desck rejected due to format problems.
Assigned Action Editor: ~Huaxiu_Yao1
Submission Number: 4228
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview