Self-Dual: Unifying Natural Language and Programmatic Thinking for Enhanced Mathematical Reasoning in LLMs

ShaoxiongGuo; Jie Tao; MingliangChe; Meng Ziyang; Zesong Xu; Tongquan Wei

Self-Dual: Unifying Natural Language and Programmatic Thinking for Enhanced Mathematical Reasoning in LLMs

ShaoxiongGuo, Jie Tao, MingliangChe, Meng Ziyang, Zesong Xu, Tongquan Wei

11 Sept 2025 (modified: 23 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Mathematical Reasoning; Large Language Models; Natural Language Reasoning; Programmatic Language Reasoning

Abstract: Large language models (LLMs) have made significant progress in mathematical reasoning. However, the methods that rely on a single reasoning paradigm exhibit clear limitations. This has motivated recent studies to combine multiple paradigms, but existing studies often fail to systematically exploit their complementary strengths. In this study, we first examine the complementary relationship between natural language (NL) and programmatic language (PL) reasoning, and show that their integration leads to consistent improvements in mathematical reasoning performance. Building on this analysis, we introduce Self-Dual, a framework that unifies the two paradigms within a single inference process by generating complementary reasoning trajectories and combining them through structured self-reflection. Beyond inference, we extend this principle to training by adopting the Self-Dual data format to construct complementary reasoning datasets and evaluate its effectiveness in model training. We conduct comprehensive evaluations of Self-Dual in both inference and training contexts. During inference, Self-Dual consistently surpasses NL-only, PL-only, and hybrid baseline methods across multiple benchmarks. DeepSeek-V3-0324 integrated with Self-Dual attains 47.8\% accuracy on the AIME25 dataset, outperforming Chain-of-Thought (CoT) at 39.2\% and Program-Aided Language (PAL) at 35.6\%. In the training experiments, we apply the Self-Dual framework to further train Qwen2.5-7B-Instruct with only 7.5K MATH samples and construct Qwen2.5-7B-SD. The new model improves performance on MATH500 by more than 4\% over the base model Qwen2.5-7B-Instruct. It also surpasses Qwen2.5-Math-7B-Instruct on AIME25. These results demonstrate that the Self-Dual framework effectively exploits complementary reasoning paradigms and substantially enhances the mathematical reasoning ability of large language models in both inference and training.

Primary Area: neurosymbolic & hybrid AI systems (physics-informed, logic & formal reasoning, etc.)

Submission Number: 4076

Loading