On the Hardness of Faithful Chain-of-Thought Reasoning in Large Language Models

Sree Harsha Tanneru; Dan Ley; Chirag Agarwal; Himabindu Lakkaraju

On the Hardness of Faithful Chain-of-Thought Reasoning in Large Language Models

Sree Harsha Tanneru, Dan Ley, Chirag Agarwal, Himabindu Lakkaraju

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Trustworthy Machine Learning, Explainability, Interpretability, Faithfulness, Large Language Models

TL;DR: We explore approaches to improve faithfulness of CoT reasoning generated from large language models, and present their shortcomings.

Abstract: As Large Language Models (LLMs) are being increasingly employed in critical domains such as healthcare, it is essential to make these models trustworthy. In this pursuit, Chain-of-Thought (CoT) prompting has emerged as a potential source of transparency in LLMs. While CoT reasoning is appealing to humans, prior studies have shown that these reasoning chains are not faithful i.e.; they do not accurately reflect the underlying LLM's behavior. Ensuring the faithfulness of LLM-generated CoT reasoning is crucial for decision-makers, who rely on them to determine if, when, and to what extent, trust the recommendations made by these models. While several works proposed strategies to enhance accuracy and truthfulness in LLMs, there has been a lack of exploration on the effectiveness of these common strategies to enhance the faithfulness of chain-of-thought (CoT) reasoning. Specifically, we explore the promise of in-context learning, fine-tuning, and activation editing to improve the faithfulness of the CoT reasoning. Our empirical analyses on benchmark tasks indicate that these strategies offer limited success in improving the faithfulness of the CoT reasoning, with only slight performance enhancements in controlled scenarios. Activation editing demonstrated minimal success, while fine-tuning and in-context learning achieved marginal improvements that failed to generalize across reasoning and truthful question-answering benchmarks. We subsequently analyse what makes faithful CoT reasoning challenging, and present findings to lay the groundwork for future research in trustworthy reasoning from LLMs. In summary, our work underscores the inherent difficulty in eliciting faithful CoT reasoning from LLMs, suggesting that the current array of approaches may not be sufficient to address this challenge.

Supplementary Material: zip

Primary Area: interpretability and explainable AI

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 12455

Loading