MULocBench: Beyond Code Benchmark for Multi-Type Software Issue Localization

Zejun Zhang; Jian Jornbowrl Wang; Qingyun Yang; Yifan Pan; tangyi; Yi Li; Zhenchang Xing; Tian Zhang; Xuandong Li; Guoan Zhang

MULocBench: Beyond Code Benchmark for Multi-Type Software Issue Localization

Zejun Zhang, Jian Jornbowrl Wang, Qingyun Yang, Yifan Pan, tangyi, Yi Li, Zhenchang Xing, Tian Zhang, Xuandong Li, Guoan Zhang

16 Sept 2025 (modified: 26 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Issue Localization; LLM; SWE-Bench;

Abstract: Accurate project localization (e.g., files and functions) for issue resolution is a critical first step in software maintenance. However, existing benchmarks for issue localization, such as SWE-Bench and LocBench, are limited. They focus predominantly on pull-request issues and code locations, ignoring other evidence and non-code files such as commits, comments, configurations, and documentation. To address this gap, we introduce MULocBench, a comprehensive dataset of 1,100 issues from 46 popular GitHub Python projects. Comparing with existing benchmarks, MULocBench offers greater diversity in issue types, root causes, location scopes, and file types, providing a more realistic testbed for evaluation. Using this benchmark, we assess the performance of state-of-the-art localization methods and five LLM-based prompting strategies. Our results reveal significant limitations in current techniques: even at the file level, performance metrics (Acc@5) remain below 40%. This underscores the challenge of generalizing to realistic, multi-faceted issue resolution. To enable future research on project localization for issue resolution, we publicly release MULocBench at https://huggingface.co/datasets/somethingone/MULocBench.

Primary Area: datasets and benchmarks

Submission Number: 7129

Loading