Defect-Introducing Defect Prediction Testing

Fanjiang Xu, Zeyu Sun

Published: 2024, Last Modified: 07 Feb 2025QRS Companion 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Machine learning (ML) innovations have significantly advanced the field of defect prediction in software development, offering the potential for automated error detection across vast codebases. These advancements promise to elevate software quality assurance by improving reliability and security. However, despite their benefits, existing defect prediction models are not without limitations, often failing to identify vulnerabilities or inaccurately flagging non-defective code segments as problematic. To overcome these shortcomings, this study introduces DPTester, a novel approach that diverges from traditional label-preserving transformations, aiming to more accurately mimic the complexities of real-world software development. Unlike previous methods, DPTester intentionally injects defects into source code to transform its semantics fundamentally. This methodology tests a defect prediction model's ability to identify and anticipate defects amidst semantic alterations. DPTester employs a two-step framework, consisting of automated test input and oracle generation. The initial phase generates test inputs by modifying conditional statements to induce potential defects. The subsequent phase evaluates the defect prediction models' performance using these inputs. A failure by a model to detect DPTester-introduced defects is considered an issue. Our evaluation of DPTester on two prominent defect prediction models, CodeT5+ and CodeBERT, involved the generation of 222,112 test inputs. This process demonstrated a 99% success rate in creating valid test scenarios. However, it exposed significant weaknesses in both models: CodeT5+ and CodeBERT's accuracy plummeted to 43% and 30%, respectively, when assessed against DPTester's complex test scenarios. Moreover, our analysis uncovered 144,104 and 98,574 prediction inconsistencies in CodeBERT and CodeT5+, respectively, underscoring the urgent need for model improvements to adeptly handle sophisticated testing landscapes. Furthermore, DPTester's operational efficiency-evidenced by its rapid test input generation and issue detection capabilities, averaging 0.002 seconds per test input and 0.025 seconds per issue-positions it as a valuable addition to automated testing frameworks, highlighting the necessity for advancements in defect prediction models to navigate complex testing environments effectively.