Atom Reasoner: LLM Self-Preference Training via Monte Carlo Tree Search

Atom Reasoner: LLM Self-Preference Training via Monte Carlo Tree Search

ACL ARR 2025 February Submission1623 Authors

14 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Recent research focuses on utilizing more test-time computation to enhance the performance of Large Language Models in solving complex mathematical and logical reasoning tasks. However, these methods allocate more computational resources during the inference phase to explore the solution space by some tree search method such as Monte Carlo Tree Search, resulting in a significant increase in inference time. In this paper, We construct atom reasoning steps, which are subsequently utilized to develop MCTS for self-preference learning to enhance the reasoning capabilities of LLMs, without employing a larger model for data distillation. Extensive evaluations on various mathematic and common sense reasoning tasks demonstrate demonstrate remarkable performance improvements over existing models. For instance, our approach outperforms the Qwen2.5-7B-instruct baseline on MATH, GSM8K and ARC with substantial increases in accuracy to 50.0% (+14.2%), 92.1% (+10.4%), and 89.6% (+13.3%), respectively.

Paper Type: Long

Research Area: Language Modeling

Research Area Keywords: fine-tuning; prompting; continual learning

Contribution Types: NLP engineering experiment

Languages Studied: English

Submission Number: 1623

Loading