Reasoning-as-Logic-Units: Scaling Test-Time Reasoning in Large Language Models Through Logic Unit Alignment

Cheryl Li; Tianyuan Xu; Steven Y. Guo

Reasoning-as-Logic-Units: Scaling Test-Time Reasoning in Large Language Models Through Logic Unit Alignment

Cheryl Li, Tianyuan Xu, Steven Y. Guo

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We propose a novel test-time scaling framework to aligning the logic between LLM-generated programs and reported reasoning steps for a more reliable reasoning path.

Abstract: Chain-of-Thought (CoT) prompting has shown promise in enhancing the reasoning capabilities of large language models (LLMs) by generating natural language (NL) rationales that lead to the final answer. However, it struggles with numerical computation, which has somehow led to the development of program-aided techniques. Despite their potential, a persistent challenge remains: inconsistencies between LLM-reported reasoning steps and the logic in generated programs, which we term ``reasoning hallucinations." This stems from the inherent ambiguities of NL and the statistical nature of LLMs, which often lack rigorous logical coherence. To address this challenge, we propose a novel test-time scaling framework, Reasoning-as-Logic-Units (RaLU), which constructs a more reliable reasoning path by aligning logical units between the generated program and their corresponding NL descriptions. By decomposing the initially generated program into discrete units using static analysis, RaLU engages in an iterative dialogue with the LLM to judge, refine, and explain each unit. A rewind-and-correct mechanism ensures alignment between code statements and task requirements in each unit, ultimately forming a cohesive reasoning path under the program's logic, from which the model reaches a final solution. Our experiments demonstrate that RaLU significantly outperforms existing baselines in mathematical reasoning (GSM8K, MATH) and algorithmic reasoning (HumanEval+, MBPP+), underscoring its potential to advance LLM reasoning and programming by offering enhanced accuracy and interpretability.

Lay Summary: Large language models excel at generating text but often struggle with precise reasoning tasks, such as solving math or algorithm problems. Existing methods, like step-by-step explanations or generating code snippets, can still produce logical inconsistencies, for example, describing a correct step but implementing it incorrectly. This paper introduces RaLU, a new framework that improves reasoning accuracy by aligning natural language explanations with code logic. RaLU breaks down a problem-solving process into smaller, verifiable units, such as checking conditions or loops in code. It then engages the model in a self-checking dialogue to validate, refine, and explain each unit. This iterative process makes every step in the reasoning path more logically sound and in line with the problem requirements. Experiments demonstrate that RaLU significantly outperforms existing methods in mathematical and coding tasks. By combining the clarity of natural language with the rigor of code, RaLU makes AI reasoning more reliable and transparent, paving the way for trustworthy applications in education, programming, and decision-making.

Link To Code: https://github.com/DeepAccept/RaLU

Primary Area: Deep Learning->Large Language Models

Keywords: large language models, LLMs, reasoning, code generation

Submission Number: 9605

Loading