Interpretable Analysis and Reasoning Enhancement for LLMs via Cross-Generation Reasoning Trees

Liang Xie; Sinan Fan; Chenxi Huang; Chen Shen; Wenxiao Wang; Binbin Lin; Xiaofei He; Ronghua Liang; Jieping Ye

Interpretable Analysis and Reasoning Enhancement for LLMs via Cross-Generation Reasoning Trees

Liang Xie, Sinan Fan, Chenxi Huang, Chen Shen, Wenxiao Wang, Binbin Lin, Xiaofei He, Ronghua Liang, Jieping Ye

26 Sept 2024 (modified: 18 Nov 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, LLM Reasoning

Abstract: Generating diverse reasoning paths by varying the context (such as demonstrations, prompts, instructions, etc) or sampling methods (such as top-k, top-p, beam-search, etc) and then selecting appropriate paths via majority voting or verifier-based strategies to enhance the reasoning capabilities of large language models (LLMs) is a commonly recognized approach. Although both different contexts and sampling techniques can generate diverse contents, using sampling methods alone does not significantly enhance the diversity of generations. Context variation, however, while fostering greater diversity in reasoning, can also introduce negative effects, which causes that switching contexts can not necessarily lead to proportional improvements in performance. Therefore, there is a need to investigate how context influences LLM generation and mitigate any adverse impacts. The primary challenge lies in the inability to conduct comparative studies once divergences occur in reasoning paths generated under different contexts. Specifically, once the predicted tokens at a given step differ, it becomes unclear whether subsequent tokens in the inference path are influenced by the context or the content already generated. In this paper, we propose a Cross-Generation Reasoning Tree (CGRT) algorithm for studying the impact of different contexts on LLM generation and enhancing LLMs' reasoning performance. Experimental findings reveal that, beyond enhancing interpretability, CGRT integrates the positive effects of both context and sampling strategies more effectively than previous approaches, leading to more rational inference paths. Experiments conducted on Llama2, Llama3, and Qwen demonstrate that, when generating an equivalent number of diverse inference paths, those produced via the "reasoning tree" method exhibit higher accuracy.

Primary Area: other topics in machine learning (i.e., none of the above)

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 6714

Loading