Cascaded Chain-of-Thoughts Distillation: Distilling Reasoning Capabilities from Large Language Models

Anonymous

Cascaded Chain-of-Thoughts Distillation: Distilling Reasoning Capabilities from Large Language Models

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone

Abstract: Large language models (LLMs) have shown remarkable reasoning capabilities at increased scales, spurring efforts to distill such capabilities into smaller, compact models via teacher-student learning. Previous works directly fine-tune student models on teachers' generated Chain-of-Thoughts (CoTs) data or learn it in a multi-task framework. However, these methods struggle with CoTs generalization due to spurious correlations between questions and answers, as well as inconsistencies in the logic connecting the rationales to the answers. In this paper, we propose \textbf{Cas}caded \textbf{Co}Ts \textbf{D}istillation (CasCoD), a straightforward but effective method to address these issues. Specifically, we decompose the full CoTs distillation into two comprehensive tasks and learn it in a cascade way by sharing the input prefix. By separating and cascading the tasks, CasCoD not only enables the student model to concentrate on reasoning without the distraction of answers but ensures faithful reasoning in students, thus enhancing the generalizability of CoTs. Extensive experiments and further analysis demonstrate the effectiveness of CasCoD on both in-domain and out-of-domain benchmark reasoning datasets.

Paper Type: long

Research Area: Efficient/Low-Resource Methods for NLP

Contribution Types: NLP engineering experiment, Approaches low compute settings-efficiency

Languages Studied: English

0 Replies

Loading