RamPO: Retrieval-Augmented Monte Carlo Tree Search Preference Optimization for Multi-Hop Question Answering

ACL ARR 2025 May Submission999 Authors

16 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Large language models achieve impressive performance on NLP tasks. Nevertheless, multi-hop question answering (QA) requires exploring a vast search space of possible reasoning, which leads to performance degradation. Recent methods often enhance multi-hop reasoning relying on inference scaling laws (e.g., performing multiple rollouts to enhance reasoning), which significantly increases latency; or retrieval-augmented generation (RAG), which requires additional valid reasoning over the retrieved content. However, real-time QA scenarios demand LLMs explore the search space for valid reasoning within limited time budgets. In this work, we propose Retrieval-Augmented Monte Carlo Tree Search Preference Optimization (RamPO), which integrates Monte Carlo Tree Search (MCTS) with a comprehensive action sequence tailored for RAG settings. By leveraging MCTS-guided heuristic exploration to constrain the search space and aligning with preference preferred by MCTS in offline, RamPO provides a trade-off between latency and reasoning accuacy during search space exploration in online inference. Experiments in three multi-hop QA datasets show that RamPO achieves an average performance improvement of 12.3% compared to recent top-notch methods, with up to 156.2× faster than existing tree-like inference approaches. Our code is available at https://github.com/NLPwang/RamPO.
Paper Type: Long
Research Area: Question Answering
Research Area Keywords: multihop QA,reasoning,open-domain QA
Contribution Types: NLP engineering experiment, Approaches low compute settings-efficiency
Languages Studied: English
Submission Number: 999
Loading