Fine-Tuning on Diverse Reasoning Chains Drives Within-Inference CoT Refinement in LLMs

ACL ARR 2024 December Submission607 Authors

14 Dec 2024 (modified: 05 Feb 2025)ACL ARR 2024 December SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Requiring a large language model (LLM) to generate intermediary reasoning steps, known as Chain of Thought (CoT), has been shown to be an effective way of boosting performance. Previous approaches have focused on generating multiple independent CoTs, combining them through ensembling or other post-hoc strategies to enhance reasoning. In this work, we introduce a novel approach where LLMs are fine-tuned to generate a sequence of Diverse Chains of Thought (DCoT) within a single inference step. By leveraging DCoT, LLMs gain the ability to perform within-inference refinement of reasoning chains without requiring external feedback. Through a rigorous set of experiments spanning a wide range of tasks that require various reasoning types, we show that fine-tuning on DCoT improves performance over the CoT baseline across model families and scales (1.3B to 70B). These improvements are particularly noteworthy for tasks with a large result state space, such as those involving numeric answers. Quantitative analyses and manual evaluations reveal that these gains stem from the models' ability to refine an initial reasoning chain by generating a second, improved chain within the same inference step.
Paper Type: Long
Research Area: Question Answering
Research Area Keywords: reasoning, reading comprehension, logical reasoning, multihop QA, commonsense QA
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Publicly available software and/or pre-trained models, Data resources
Languages Studied: English
Submission Number: 607
Loading