Abstract: Software defect prediction has become a significant research direction in software testing. The comprehensiveness of defect prediction directly impacts testing efficiency and program execution. In practical applications, there is a disconnect between detected software defects and their explanations or modification suggestions. Most methods stop at ranking the importance of static code features and fail to provide actionable repair recommendations. To address this issue, it is necessary to provide the location and a description of the predicted software defects. Therefore, this paper adopts the Common Weakness Enumeration (CWE) as the defect classification standard. Leveraging the remarkable capabilities demonstrated by large language models in code understanding tasks, we design structured prompts based on software engineering principles and prior defect knowledge for data sampling and labeling. Through a systematic analysis of the quality of the synthetic data, we identify a more suitable specific closed-source model pool. Experimental results demonstrate that our proposed method, Handpick—which automates the construction of software defect prediction datasets using large language models—can provide defect localization and repair suggestions during software defect prediction, thereby assisting developers in better rectifying software defects.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: code generation and understanding
Contribution Types: NLP engineering experiment, Approaches to low-resource settings, Data resources
Languages Studied: Chinese, English, Java, JS, Python, C++
Submission Number: 2573
Loading