Keywords: large language models, code question answering, understanding software repository
TL;DR: DeepRepoQA, a repository question answering (QA) approach where LLM agents find answers through a systematic tree search over structured action spaces.
Abstract: Effectively answering developer questions about a software repository is a critical yet under-explored problem in software engineering. While existing repository understanding methods have advanced the field, they predominantly rely on surface-level code retrieval and lack the ability for deep reasoning over multiple files, complex software architectures, and grounding answers in long-range code dependencies. To address these limitations, we propose DeepRepoQA, a repository question answering (QA) approach in realistic code environments. DeepRepoQA builds on the agentic framework where LLM agents find answers through a systematic tree search over structured action spaces. Our key innovations include balanced exploration and exploitation via Monte Carlo Tree Search (MCTS) for multi-hop repository reasoning and LLM feedback that provides learned priors and values to reduce search depth and reduce drift. The system maintains structured memory paths that enable reliable evidence synthesis and traceable reasoning steps. Comprehensive experiments on SWE-QA demonstrate substantial performance gains over strong baselines, validating the effectiveness of systematic MCTS-guided exploration for multi-hop repository reasoning.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 6861
Loading