HANDPICK:Construction of an LLM-Based Software Defect Prediction Dataset

HANDPICK:Construction of an LLM-Based Software Defect Prediction Dataset

ACL ARR 2025 May Submission2573 Authors

19 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Software defect prediction has become a significant research direction in software testing. The comprehensiveness of defect prediction directly impacts testing efficiency and program execution. In practical applications, there is a disconnect between detected software defects and their explanations or modification suggestions. Most methods stop at ranking the importance of static code features and fail to provide actionable repair recommendations. To address this issue, it is necessary to provide the location and a description of the predicted software defects. Therefore, this paper adopts the Common Weakness Enumeration (CWE) as the defect classification standard. Leveraging the remarkable capabilities demonstrated by large language models in code understanding tasks, we design structured prompts based on software engineering principles and prior defect knowledge for data sampling and labeling. Through a systematic analysis of the quality of the synthetic data, we identify a more suitable specific closed-source model pool. Experimental results demonstrate that our proposed method, Handpick—which automates the construction of software defect prediction datasets using large language models—can provide defect localization and repair suggestions during software defect prediction, thereby assisting developers in better rectifying software defects.

Paper Type: Long

Research Area: NLP Applications

Research Area Keywords: code generation and understanding

Contribution Types: NLP engineering experiment, Approaches to low-resource settings, Data resources

Languages Studied: Chinese, English, Java, JS, Python, C++

Submission Number: 2573

Loading