MCC-KD: Multi-CoT Consistent Knowledge Distillation

Hongzhan Chen; Siyue Wu; Xiaojun Quan; Rui Wang; Ming Yan; Ji Zhang

MCC-KD: Multi-CoT Consistent Knowledge Distillation

Hongzhan Chen, Siyue Wu, Xiaojun Quan, Rui Wang, Ming Yan, Ji Zhang

Published: 07 Oct 2023, Last Modified: 01 Dec 2023EMNLP 2023 FindingsEveryoneRevisionsBibTeX

Submission Type: Regular Long Paper

Submission Track: Efficient Methods for NLP

Submission Track 2: Theme Track: Large Language Models and the Future of NLP

Keywords: Knowledge Distillation, Chain of Thought, Reasoning

Abstract: Large language models (LLMs) have showcased remarkable capabilities in complex reasoning through chain of thought (CoT) prompting. Recently, there has been a growing interest in transferring these reasoning abilities from LLMs to smaller models. However, achieving both the diversity and consistency in rationales presents a challenge. In this paper, we focus on enhancing these two aspects and propose Multi-CoT Consistent Knowledge Distillation (MCC-KD) to efficiently distill the reasoning capabilities. In MCC-KD, we generate multiple rationales for each question and enforce consistency among their predictions by minimizing the bidirectional KL-divergence between the answer distributions. We conduct comprehensive experiments to investigate the effectiveness of MCC-KD with different model architectures (LLaMA/FlanT5) and various model scales (3B/7B/11B/13B) on both mathematical reasoning and commonsense reasoning benchmarks. The empirical results demonstrate that MCC-KD achieves superior performance on in-distribution datasets and exhibits a strong generalization ability on out-of-distribution datasets.

Submission Number: 615

Loading