Abstract: Recent research focuses on utilizing more test-time computation to enhance the performance of Large Language Models in solving complex mathematical and logical reasoning tasks. However, these methods allocate more computational resources during the inference phase to explore the solution space by some tree search method such as Monte Carlo Tree Search, resulting in a significant increase in inference time. In this paper, We construct atom reasoning steps, which are subsequently utilized to develop MCTS for self-preference learning to enhance the reasoning capabilities of LLMs, without employing a larger model for data distillation. Extensive evaluations on various mathematic and common sense reasoning tasks demonstrate demonstrate remarkable performance improvements over existing models. For instance, our approach outperforms the Qwen2.5-7B-instruct baseline on MATH, GSM8K and ARC with substantial increases in accuracy to 50.0% (+14.2%), 92.1% (+10.4%), and 89.6% (+13.3%), respectively.
Paper Type: Long
Research Area: Language Modeling
Research Area Keywords: fine-tuning; prompting; continual learning
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 1623
Loading