CoCPerD: Mitigating Spurious Correlations Between Question and Answer via Chain-of-Thought Correctness Perception Distillation

ACL ARR 2025 February Submission2124 Authors

14 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Large language models (LLMs) have demonstrated extraordinary reasoning prowess, but their deployment costs are prohibitively high. Therefore, previous research efforts have endowed Small language models (SLMs) with reasoning abilities by fine-tuning them on the Chain-of-Thought (CoT) data generated by LLMs. However, during the learning process, SLMs may capture spurious correlations between questions and answers, making it difficult to ensure the soundness of the generated rationales and their consistency with the predicted answers. In this work, we propose the Chain-of-Thought Correctness Perception Distillation (CoCPerD) method, which perceives the correctness of the rationale and applies distinct strategies accordingly. Specifically, we collect both correct and erroneous rationales from the teacher and student models. During training, we label the rationales with a status string indicating whether they are correct or erroneous. If the rationale is correct, the student model predicts the answer; if the rationale is erroneous, the student model corrects the erroneous rationale. This encourages the student model to rely on valid reasoning paths for answer prediction and learn from mistakes, thereby enhancing the faithfulness and soundness of the generated rationales. Experiments have shown that CoCPerD is effective on both in-distribution (IND) and out-of-distribution (OOD) benchmark reasoning datasets.
Paper Type: Long
Research Area: Efficient/Low-Resource Methods for NLP
Research Area Keywords: distillation
Contribution Types: Approaches to low-resource settings
Languages Studied: English
Submission Number: 2124
Loading